This incorporates some code from mlocate's updatedb, and thus is compatible
with /etc/updatedb.conf, and supports all the pruning options from it.
All the code has been heavily modified, e.g. the gnulib dependency has been
removed and replaced with STL code (kicking 10k+ lines of code), the bind
mount code has been fixed (it was all broken since the switch from /etc/mtab
to /proc/self/mountinfo) and everything has been reformatted. Like with mlocate,
plocate's updatedb is merging, ie., it can skip readdir() on unchanged
directories. (The logic here is also copied pretty verbatim from mlocate.)
updatedb reads plocate's native format; there's a new max_version 2 that
contains directory timestamps (without it, updatedb will fall back to a full
scan). The timestamps increase the database size by only about 1%, which is a
good tradeoff when we're getting rid of the entire mlocate database.
We liberally use modern features to simplify the implementation; in particular,
openat() to avoid race conditions, instead of mlocate's complicated chdir() dance.
Unfortunately, the combination of the slightly strange storage order from mlocate,
and openat(), means we can need to keep up a bunch of file descriptors open,
but they are not an expensive resource these days, and we try to bump the
limit ourselves if we are allowed to. We also use O_TMPFILE, to make sure we
never leave a half-finished file lying around (mlocate's updatedb tries to
catch signals instead). All of this may hinder portability, so we might ease up
on the requirements later. We don't use io_uring for updatedb at this point.
plocate-build does not write the needed timestamps, so the first upgrade from
mlocate to native plocate requires a full rescan.
NOTE: The format is _not_ frozen yet, and won't be until actual release.
By opening with O_TMPFILE, we guarantee we'll never be leaving
an unfinished file visible on the filesystem. The move across the
old one isn't atomic, but the window of failure is very small now.
The manpage claims the return value should be 0 on a null byte,
just like on Linux, but in practice, it returns -1, so we need to
check for end-of-string manually.
Escape unprintable characters when outputting filenames to a terminal.
Filenames are generally untrusted, and can contain any kind of cruft.
In particular, there have been terminals (hopefully not in wide use anymore!)
that will do insanity like running specific commands when seeing a
specific escape sequence. More prosaically, embedded newlines can
make for confusing output.
Thus, escape any nonprintable characters in a shell-parseable way,
much the same way GNU ls does these days. Also escape quotes, backslashes
and the likes to make sure nothing unescaped looks like it's escaped.
This doesn't mean it's safe to take whatever and parse it uncritically
(we don't escape $, for instance), but it's generally good enough.
Escaping is disabled when doing zero-terminated output, or when printing
to a pipe or file.
When we have a scan that we cannot accelerate with trigrams
(very short patterns, or regexes), we need to go through all of
the file names like mlocate does. This is usually CPU-bound,
so fire up threads. We leave one core/hyperthread for the I/O
and add a thread for each of the rest (this is probably bad
on dualcore, but it's a simple thing that will do for now,
and should be fairly safe).
The bottleneck now is Serializer. I first tried just putting a
mutex on it, which worked fine on eight hyperthreads
(ie., four real cores, my laptop), but caused huge contention with 40
(20 cores, my old dual-socket Haswell). Sending data back through
per-thread queues seems to work a lot better, but we're still
spending a lot of time in Serializer; witness that --count is
much faster for such a search.
Since we have small strings, they can benefit from some shared context,
and zstd supports this. plocate-build now reads the mlocate database
twice; the first pass samples 1000 random blocks, which it uses to train
a 1 kB dictionary. (zstd recommends much larger dictionaries, but practical
testing seems to indicate this doesn't help us much, and might actually
be harmful.)
We get ~20% slower builds and ~7% smaller .db files -- but more
interestingly, linear search speed is up ~20% (which indicates that
decompression in itself benefits more). We need to read the 1 kB
dictionary, but it's practically free since it's stored next to the
header and so small.
This is a version bump (to version 1), so we're not forward-compatible,
but we're backward-compatible (plocate still reads version 0 files
just fine). Since we're adding more fields to the header anyway,
we can add a new “max_version” field that allows for marking
backwards-compatible changes in the future, ie., if plocate-build
adds more information that plocate would like to use but that older
plocate versions can simply ignore.
Do the access checking asynchronously if possible.
There are many issues involved:
- There's no access() support in io_uring (yet?), so we fake it
by doing statx() on the directory first, which primes the
dentry cache so that synchronous access() becomes very fast.
It is a bit tricky, since multiple access checks could be
going on at the same time, which the need to all wait
for the same statx() call.
- Not even all kernels support statx() in io_uring (support starts
from 5.6+).
- Serialization now becomes two-level, and more involved.
We don't have an obvious single counter anymore, so we need
to be able to start a docid without knowing how many candidates
there are (and thus, be able to tell Serializer that we are
at the end).
- Limit becomes more tricky, since there can be more calls on
the way back. We solve this by moving limit into Serializer,
and hard-exiting when we hit the limit.
- We need to prioritize statx() calls ahead of read(), so that
we don't end up with very delayed output when the new read()
calls generate even more statx() calls and we get a huge
backlog of calls. (We can't prioritize in the kernel, but we
can on the overflow queue we're managing ourselves.) This is
especially important with --limit.
This matches mlocate behavior; even the sort-of strange behavior
of having them non-anchored. Case-insensitive matching has also
been changed away from regex, since fnmatch() is seemingly slightly
faster.
Without changing the database format, this causes a bunch of extra
lookups. But somehow, it appears to go fairly well in practice.
Of course, case-sensitive will always be faster.