Switch trigram lookup from binary search to a hash table.
Binary search was fine when we just wanted simplicity, but for I/O
optimization on rotating media, we want as few seeks as possible.
A hash table with open addressing gives us just that; Robin Hood
hashing makes it possible for us to guarantee maximum probe length,
so we can just read 256 bytes (plus a little slop) for each lookup
and that's it. This kills ~30 ms or so cold-cache.
This breaks the format, so we use the chance to add a magic and
a proper header to provide some more flexibility in case we want
to change the builder.
This moves to explicit, asynchronous I/O through io_uring (Linux 5.1+),
which speeds up cold-cache behavior on rotating media by 3x or so.
It also removes any issues we might have with not fitting into 32-bit
address spaces.
If io_uring is not available, regular synchronous I/O will be used instead.
For now, there's a dependency on liburing, but it will be optional soon
for older systems.
This was surprisingly undocumented (I had to read preprocessed
TurboPFor to figure it out), but we can now build posting lists
directly compressed in memory 128 elements at a time, which saves
another 30% RAM in plocate-build.
Hold compressed filenames more efficiently in memory.
std::string::resize() will rarely give memory back, so allocate
the string with the right size to begin with. This means we don't
carry around a lot of slop, saving ~15% RAM during build.
This saves ~50% RAM in the build step, now that we have blocking
(there's a lot of deduplication going on), and seemingly also
~15% execution time, possibly because of less memory allocation
(I haven't checked thoroughly).
Make blocks of 32 and 32 filenames, and compress then with zstd -6
(the level is fairly arbitrarily chosen). This compresses the repetitive
path information very well, and also allows us to have shorter posting
lists, as they can point into the blocks (allowing dedup).
32 was chosen after eyeballing some compressed sizes, looking for
diminishing returns and then verifying it didn't cost much in terms
of search performance.