This can be useful if recently having deleted some files,
and not wanting to do a database rebuild. Note that we don't
support the --nofollow option (--follow is the default),
since it's not clear what it would be useful for, and the mlocate
source code says it “looks like a historical accident”.
We support multiple -d arguments, single -d arguments with multiple
databases (colon-separated), and LOCATE_PATH. The two latter should
be compatible with mlocate, although we support escaping colons in
file paths and I believe mlocate does not.
This is slightly trickier than one might expect due to security
considerations; we're not robust against malicious input, so we need to
make sure that if we process an attacker-supplied database, the process
has already dropped privileges and cannot subvert a privileged reader.
Some filesystems don't know from getdents() whether an entry is a file
or a directory without a stat(). I had assumed this was only an issue
for obscure operating systems, so I removed it (mlocate's updatedb
supported it), but evidently older versions of XFS has this issue, too,
so add back checking.
This is a last-resort solution; we don't do unlink-on-signal or similar,
so if updatedb or plocate-build is aborted on such platforms, there
will be an orphan temporary file.
This incorporates some code from mlocate's updatedb, and thus is compatible
with /etc/updatedb.conf, and supports all the pruning options from it.
All the code has been heavily modified, e.g. the gnulib dependency has been
removed and replaced with STL code (kicking 10k+ lines of code), the bind
mount code has been fixed (it was all broken since the switch from /etc/mtab
to /proc/self/mountinfo) and everything has been reformatted. Like with mlocate,
plocate's updatedb is merging, ie., it can skip readdir() on unchanged
directories. (The logic here is also copied pretty verbatim from mlocate.)
updatedb reads plocate's native format; there's a new max_version 2 that
contains directory timestamps (without it, updatedb will fall back to a full
scan). The timestamps increase the database size by only about 1%, which is a
good tradeoff when we're getting rid of the entire mlocate database.
We liberally use modern features to simplify the implementation; in particular,
openat() to avoid race conditions, instead of mlocate's complicated chdir() dance.
Unfortunately, the combination of the slightly strange storage order from mlocate,
and openat(), means we can need to keep up a bunch of file descriptors open,
but they are not an expensive resource these days, and we try to bump the
limit ourselves if we are allowed to. We also use O_TMPFILE, to make sure we
never leave a half-finished file lying around (mlocate's updatedb tries to
catch signals instead). All of this may hinder portability, so we might ease up
on the requirements later. We don't use io_uring for updatedb at this point.
plocate-build does not write the needed timestamps, so the first upgrade from
mlocate to native plocate requires a full rescan.
NOTE: The format is _not_ frozen yet, and won't be until actual release.
By opening with O_TMPFILE, we guarantee we'll never be leaving
an unfinished file visible on the filesystem. The move across the
old one isn't atomic, but the window of failure is very small now.
The manpage claims the return value should be 0 on a null byte,
just like on Linux, but in practice, it returns -1, so we need to
check for end-of-string manually.
Escape unprintable characters when outputting filenames to a terminal.
Filenames are generally untrusted, and can contain any kind of cruft.
In particular, there have been terminals (hopefully not in wide use anymore!)
that will do insanity like running specific commands when seeing a
specific escape sequence. More prosaically, embedded newlines can
make for confusing output.
Thus, escape any nonprintable characters in a shell-parseable way,
much the same way GNU ls does these days. Also escape quotes, backslashes
and the likes to make sure nothing unescaped looks like it's escaped.
This doesn't mean it's safe to take whatever and parse it uncritically
(we don't escape $, for instance), but it's generally good enough.
Escaping is disabled when doing zero-terminated output, or when printing
to a pipe or file.