Do the access checking asynchronously if possible.
There are many issues involved:
- There's no access() support in io_uring (yet?), so we fake it
by doing statx() on the directory first, which primes the
dentry cache so that synchronous access() becomes very fast.
It is a bit tricky, since multiple access checks could be
going on at the same time, which the need to all wait
for the same statx() call.
- Not even all kernels support statx() in io_uring (support starts
from 5.6+).
- Serialization now becomes two-level, and more involved.
We don't have an obvious single counter anymore, so we need
to be able to start a docid without knowing how many candidates
there are (and thus, be able to tell Serializer that we are
at the end).
- Limit becomes more tricky, since there can be more calls on
the way back. We solve this by moving limit into Serializer,
and hard-exiting when we hit the limit.
- We need to prioritize statx() calls ahead of read(), so that
we don't end up with very delayed output when the new read()
calls generate even more statx() calls and we get a huge
backlog of calls. (We can't prioritize in the kernel, but we
can on the overflow queue we're managing ourselves.) This is
especially important with --limit.