]> git.sesse.net Git - plocate/log
plocate
3 years agoRelease plocate 1.0.6. 1.0.6
Steinar H. Gunderson [Thu, 29 Oct 2020 23:06:10 +0000 (00:06 +0100)]
Release plocate 1.0.6.

3 years agoEscape unprintable characters when outputting filenames to a terminal.
Steinar H. Gunderson [Thu, 29 Oct 2020 22:42:01 +0000 (23:42 +0100)]
Escape unprintable characters when outputting filenames to a terminal.

Filenames are generally untrusted, and can contain any kind of cruft.
In particular, there have been terminals (hopefully not in wide use anymore!)
that will do insanity like running specific commands when seeing a
specific escape sequence. More prosaically, embedded newlines can
make for confusing output.

Thus, escape any nonprintable characters in a shell-parseable way,
much the same way GNU ls does these days. Also escape quotes, backslashes
and the likes to make sure nothing unescaped looks like it's escaped.
This doesn't mean it's safe to take whatever and parse it uncritically
(we don't escape $, for instance), but it's generally good enough.

Escaping is disabled when doing zero-terminated output, or when printing
to a pipe or file.

3 years agoFix a crash when we have a too few blocks to train a dictionary.
Steinar H. Gunderson [Tue, 20 Oct 2020 16:55:37 +0000 (18:55 +0200)]
Fix a crash when we have a too few blocks to train a dictionary.

3 years agoSupport building databases from plaintext files.
Steinar H. Gunderson [Tue, 20 Oct 2020 16:53:58 +0000 (18:53 +0200)]
Support building databases from plaintext files.

This was already possible by uncommenting some code, but has now
given a switch and also being made more robust.

3 years agoAdd an alternative for __builtin_clz.
Steinar H. Gunderson [Sat, 17 Oct 2020 13:03:39 +0000 (15:03 +0200)]
Add an alternative for __builtin_clz.

Speed isn't critical here, and this was ostensibly the last GCC-ism.

3 years agoRemove some unneeded __attribute__((unused)).
Steinar H. Gunderson [Sat, 17 Oct 2020 12:37:25 +0000 (14:37 +0200)]
Remove some unneeded __attribute__((unused)).

3 years agoFix the function multiversioning Meson test.
Steinar H. Gunderson [Sat, 17 Oct 2020 12:33:12 +0000 (14:33 +0200)]
Fix the function multiversioning Meson test.

The old one was seemingly too lenient, and would have false positives.

3 years agoBump version number.
Steinar H. Gunderson [Sat, 17 Oct 2020 12:33:05 +0000 (14:33 +0200)]
Bump version number.

3 years agoRelease plocate 1.0.5. 1.0.5
Steinar H. Gunderson [Sat, 17 Oct 2020 09:39:54 +0000 (11:39 +0200)]
Release plocate 1.0.5.

3 years agoFix the -r short option.
Steinar H. Gunderson [Sat, 17 Oct 2020 09:32:17 +0000 (11:32 +0200)]
Fix the -r short option.

3 years agoSupport compiling on x86 platforms without working function multiversioning.
Steinar H. Gunderson [Sat, 17 Oct 2020 09:10:41 +0000 (11:10 +0200)]
Support compiling on x86 platforms without working function multiversioning.

3 years agoclang-format.
Steinar H. Gunderson [Sat, 17 Oct 2020 07:55:45 +0000 (09:55 +0200)]
clang-format.

3 years agoAdd the missing end timing if linear scan and --debug is used together.
Steinar H. Gunderson [Sat, 17 Oct 2020 07:55:18 +0000 (09:55 +0200)]
Add the missing end timing if linear scan and --debug is used together.

3 years agoFix some inconsistencies in the man page.
Steinar H. Gunderson [Sat, 17 Oct 2020 07:47:13 +0000 (09:47 +0200)]
Fix some inconsistencies in the man page.

3 years agoImplement the -b (--basename) option.
Steinar H. Gunderson [Sat, 17 Oct 2020 07:46:55 +0000 (09:46 +0200)]
Implement the -b (--basename) option.

3 years agoFix a wrong IWYU include.
Steinar H. Gunderson [Fri, 16 Oct 2020 08:02:28 +0000 (10:02 +0200)]
Fix a wrong IWYU include.

3 years agoFix detection of -latomic (it doesn't come from pkg-config).
Steinar H. Gunderson [Fri, 16 Oct 2020 08:01:28 +0000 (10:01 +0200)]
Fix detection of -latomic (it doesn't come from pkg-config).

3 years agoAdd -latomic if it exists; seems to be required on armel and sh4.
Steinar H. Gunderson [Fri, 16 Oct 2020 07:26:41 +0000 (09:26 +0200)]
Add -latomic if it exists; seems to be required on armel and sh4.

3 years agoBump the version number.
Steinar H. Gunderson [Fri, 16 Oct 2020 07:27:14 +0000 (09:27 +0200)]
Bump the version number.

3 years agoRelease plocate 1.0.4. 1.0.4
Steinar H. Gunderson [Thu, 15 Oct 2020 22:50:22 +0000 (00:50 +0200)]
Release plocate 1.0.4.

3 years agoMove the cache-flushing behavior into an undocumented option, so that one does not...
Steinar H. Gunderson [Thu, 15 Oct 2020 22:48:23 +0000 (00:48 +0200)]
Move the cache-flushing behavior into an undocumented option, so that one does not have to recompile to test it. (Drops setgid.)

3 years agoMove several needle/searching related functions into its own file.
Steinar H. Gunderson [Thu, 15 Oct 2020 22:42:25 +0000 (00:42 +0200)]
Move several needle/searching related functions into its own file.

3 years agoMove AccessRXCache into its own file.
Steinar H. Gunderson [Thu, 15 Oct 2020 22:36:20 +0000 (00:36 +0200)]
Move AccessRXCache into its own file.

3 years agoRun clang-format.
Steinar H. Gunderson [Thu, 15 Oct 2020 22:23:11 +0000 (00:23 +0200)]
Run clang-format.

3 years agoMove Serializer into its own file.
Steinar H. Gunderson [Thu, 15 Oct 2020 22:22:59 +0000 (00:22 +0200)]
Move Serializer into its own file.

3 years agoMerge non-results from worker threads to put less load on Serializer.
Steinar H. Gunderson [Thu, 15 Oct 2020 21:41:16 +0000 (23:41 +0200)]
Merge non-results from worker threads to put less load on Serializer.

3 years agoGive the WorkerThread results a proper struct instead of std::tuple.
Steinar H. Gunderson [Thu, 15 Oct 2020 21:40:42 +0000 (23:40 +0200)]
Give the WorkerThread results a proper struct instead of std::tuple.

3 years agoMultithread linear scans.
Steinar H. Gunderson [Thu, 15 Oct 2020 20:41:42 +0000 (22:41 +0200)]
Multithread linear scans.

When we have a scan that we cannot accelerate with trigrams
(very short patterns, or regexes), we need to go through all of
the file names like mlocate does. This is usually CPU-bound,
so fire up threads. We leave one core/hyperthread for the I/O
and add a thread for each of the rest (this is probably bad
on dualcore, but it's a simple thing that will do for now,
and should be fairly safe).

The bottleneck now is Serializer. I first tried just putting a
mutex on it, which worked fine on eight hyperthreads
(ie., four real cores, my laptop), but caused huge contention with 40
(20 cores, my old dual-socket Haswell). Sending data back through
per-thread queues seems to work a lot better, but we're still
spending a lot of time in Serializer; witness that --count is
much faster for such a search.

3 years agoDon't flush the cache on plocate.db.
Steinar H. Gunderson [Wed, 14 Oct 2020 22:56:37 +0000 (00:56 +0200)]
Don't flush the cache on plocate.db.

This was changed by mistake in an earlier patch.

3 years agoBump version number.
Steinar H. Gunderson [Wed, 14 Oct 2020 22:56:23 +0000 (00:56 +0200)]
Bump version number.

3 years agoRelease plocate 1.0.3. 1.0.3
Steinar H. Gunderson [Wed, 14 Oct 2020 22:13:52 +0000 (00:13 +0200)]
Release plocate 1.0.3.

3 years agoIn plocate-build, open the file only once.
Steinar H. Gunderson [Wed, 14 Oct 2020 21:35:54 +0000 (23:35 +0200)]
In plocate-build, open the file only once.

3 years agoIf plocate-build cannot open the output file, give a proper error instead of crashing.
Steinar H. Gunderson [Wed, 14 Oct 2020 21:32:02 +0000 (23:32 +0200)]
If plocate-build cannot open the output file, give a proper error instead of crashing.

3 years agoAdd some options for controlling installation and processing of the cron.daily script.
Steinar H. Gunderson [Wed, 14 Oct 2020 21:31:08 +0000 (23:31 +0200)]
Add some options for controlling installation and processing of the cron.daily script.

3 years agoUnbreak compilation for non-x86.
Steinar H. Gunderson [Wed, 14 Oct 2020 17:01:38 +0000 (19:01 +0200)]
Unbreak compilation for non-x86.

3 years agoUnbreak compilation of bench.
Steinar H. Gunderson [Wed, 14 Oct 2020 16:54:28 +0000 (18:54 +0200)]
Unbreak compilation of bench.

3 years agoSupport --debug for plocate-build, and unbreak some debug printfs there.
Steinar H. Gunderson [Tue, 13 Oct 2020 16:08:05 +0000 (18:08 +0200)]
Support --debug for plocate-build, and unbreak some debug printfs there.

3 years agoFix --version in plocate-build.
Steinar H. Gunderson [Tue, 13 Oct 2020 15:55:53 +0000 (17:55 +0200)]
Fix --version in plocate-build.

3 years agoUse zstd dictionaries.
Steinar H. Gunderson [Tue, 13 Oct 2020 15:46:20 +0000 (17:46 +0200)]
Use zstd dictionaries.

Since we have small strings, they can benefit from some shared context,
and zstd supports this. plocate-build now reads the mlocate database
twice; the first pass samples 1000 random blocks, which it uses to train
a 1 kB dictionary. (zstd recommends much larger dictionaries, but practical
testing seems to indicate this doesn't help us much, and might actually
be harmful.)

We get ~20% slower builds and ~7% smaller .db files -- but more
interestingly, linear search speed is up ~20% (which indicates that
decompression in itself benefits more). We need to read the 1 kB
dictionary, but it's practically free since it's stored next to the
header and so small.

This is a version bump (to version 1), so we're not forward-compatible,
but we're backward-compatible (plocate still reads version 0 files
just fine). Since we're adding more fields to the header anyway,
we can add a new “max_version” field that allows for marking
backwards-compatible changes in the future, ie., if plocate-build
adds more information that plocate would like to use but that older
plocate versions can simply ignore.

3 years agoReuse zstd compression contexts, for a tiny speed boost.
Steinar H. Gunderson [Mon, 12 Oct 2020 18:08:58 +0000 (20:08 +0200)]
Reuse zstd compression contexts, for a tiny speed boost.

3 years agoBump version number.
Steinar H. Gunderson [Mon, 12 Oct 2020 18:04:26 +0000 (20:04 +0200)]
Bump version number.

3 years agoRelease plocate 1.0.2. 1.0.2
Steinar H. Gunderson [Mon, 12 Oct 2020 07:56:32 +0000 (09:56 +0200)]
Release plocate 1.0.2.

3 years agoAdd a NEWS file (pretty boring currently).
Steinar H. Gunderson [Mon, 12 Oct 2020 07:56:21 +0000 (09:56 +0200)]
Add a NEWS file (pretty boring currently).

3 years agoFix some 32-bit issues.
Steinar H. Gunderson [Mon, 12 Oct 2020 07:52:07 +0000 (09:52 +0200)]
Fix some 32-bit issues.

3 years agoUpdate the correct (generated) version of update-plocate.sh.
Steinar H. Gunderson [Sun, 11 Oct 2020 22:57:39 +0000 (00:57 +0200)]
Update the correct (generated) version of update-plocate.sh.

3 years agoBump the version number.
Steinar H. Gunderson [Sun, 11 Oct 2020 22:57:28 +0000 (00:57 +0200)]
Bump the version number.

3 years agoRelease plocate 1.0.1. 1.0.1
Steinar H. Gunderson [Sun, 11 Oct 2020 22:22:51 +0000 (00:22 +0200)]
Release plocate 1.0.1.

3 years agoMake update-plocate.sh work properly if installed to /usr.
Steinar H. Gunderson [Sun, 11 Oct 2020 22:22:38 +0000 (00:22 +0200)]
Make update-plocate.sh work properly if installed to /usr.

3 years agoUnbreak non-trigram matches after we changed to asynchronous access().
Steinar H. Gunderson [Sun, 11 Oct 2020 21:58:41 +0000 (23:58 +0200)]
Unbreak non-trigram matches after we changed to asynchronous access().

Non-trigram matches don't use async I/O, so they also can't use
async access(). Fix so that they don't segfault anymore.

3 years agoCorrect section of plocate-build manpage.
Steinar H. Gunderson [Sun, 11 Oct 2020 19:33:23 +0000 (21:33 +0200)]
Correct section of plocate-build manpage.

3 years agoBump version number.
Steinar H. Gunderson [Sun, 11 Oct 2020 19:33:13 +0000 (21:33 +0200)]
Bump version number.

3 years agoRelease plocate 1.0.0. 1.0.0
Steinar H. Gunderson [Sun, 11 Oct 2020 18:45:23 +0000 (20:45 +0200)]
Release plocate 1.0.0.

3 years agoDo the access checking asynchronously if possible.
Steinar H. Gunderson [Sun, 11 Oct 2020 17:59:11 +0000 (19:59 +0200)]
Do the access checking asynchronously if possible.

There are many issues involved:

 - There's no access() support in io_uring (yet?), so we fake it
   by doing statx() on the directory first, which primes the
   dentry cache so that synchronous access() becomes very fast.
   It is a bit tricky, since multiple access checks could be
   going on at the same time, which the need to all wait
   for the same statx() call.
 - Not even all kernels support statx() in io_uring (support starts
   from 5.6+).
 - Serialization now becomes two-level, and more involved.
   We don't have an obvious single counter anymore, so we need
   to be able to start a docid without knowing how many candidates
   there are (and thus, be able to tell Serializer that we are
   at the end).
 - Limit becomes more tricky, since there can be more calls on
   the way back. We solve this by moving limit into Serializer,
   and hard-exiting when we hit the limit.
 - We need to prioritize statx() calls ahead of read(), so that
   we don't end up with very delayed output when the new read()
   calls generate even more statx() calls and we get a huge
   backlog of calls. (We can't prioritize in the kernel, but we
   can on the overflow queue we're managing ourselves.) This is
   especially important with --limit.

3 years agoUse the PRId64 #define for formatting int64.
Steinar H. Gunderson [Sun, 11 Oct 2020 17:57:20 +0000 (19:57 +0200)]
Use the PRId64 #define for formatting int64.

3 years agoAdd debug output if io_uring initialization fails.
Steinar H. Gunderson [Sun, 11 Oct 2020 17:09:18 +0000 (19:09 +0200)]
Add debug output if io_uring initialization fails.

3 years agoFix #include order.
Steinar H. Gunderson [Sun, 11 Oct 2020 09:49:35 +0000 (11:49 +0200)]
Fix #include order.

3 years agoRemove some unneeded whitespace.
Steinar H. Gunderson [Sun, 11 Oct 2020 09:49:00 +0000 (11:49 +0200)]
Remove some unneeded whitespace.

3 years agoDisallow limit <= 0.
Steinar H. Gunderson [Sun, 11 Oct 2020 09:48:43 +0000 (11:48 +0200)]
Disallow limit <= 0.

3 years agoREADME updates.
Steinar H. Gunderson [Sun, 11 Oct 2020 08:11:43 +0000 (10:11 +0200)]
README updates.

3 years agoAdd some man pages.
Steinar H. Gunderson [Sun, 11 Oct 2020 08:07:38 +0000 (10:07 +0200)]
Add some man pages.

3 years agoAdd support for some basic options in plocate-build; specifically, block size.
Steinar H. Gunderson [Sat, 10 Oct 2020 20:24:47 +0000 (22:24 +0200)]
Add support for some basic options in plocate-build; specifically, block size.

This also means it will stop segfaulting if no options are given.

3 years agoImplement support for larger basevals in TurboPFor.
Steinar H. Gunderson [Sat, 10 Oct 2020 20:24:27 +0000 (22:24 +0200)]
Implement support for larger basevals in TurboPFor.

3 years agoSupport searching by regexp (brute force only).
Steinar H. Gunderson [Sat, 10 Oct 2020 19:43:20 +0000 (21:43 +0200)]
Support searching by regexp (brute force only).

Mostly for compatibility completeness.

3 years agoWrite new --help text from scratch, so that we have nothing from mlocate except some...
Steinar H. Gunderson [Sat, 10 Oct 2020 19:26:30 +0000 (21:26 +0200)]
Write new --help text from scratch, so that we have nothing from mlocate except some structs.

3 years agoAdd a --version option.
Steinar H. Gunderson [Sat, 10 Oct 2020 18:39:44 +0000 (20:39 +0200)]
Add a --version option.

3 years agoAllow giving --debug to enable debugging (but drops setgid).
Steinar H. Gunderson [Sat, 10 Oct 2020 17:30:13 +0000 (19:30 +0200)]
Allow giving --debug to enable debugging (but drops setgid).

3 years agoUnbreak the --null long option.
Steinar H. Gunderson [Sat, 10 Oct 2020 17:29:01 +0000 (19:29 +0200)]
Unbreak the --null long option.

3 years agoUse globs if there are wildcards in the pattern.
Steinar H. Gunderson [Sat, 10 Oct 2020 17:18:52 +0000 (19:18 +0200)]
Use globs if there are wildcards in the pattern.

This matches mlocate behavior; even the sort-of strange behavior
of having them non-anchored. Case-insensitive matching has also
been changed away from regex, since fnmatch() is seemingly slightly
faster.

3 years agoSome clang-formatting.
Steinar H. Gunderson [Sat, 10 Oct 2020 17:09:36 +0000 (19:09 +0200)]
Some clang-formatting.

3 years agoSupport case-insensitive searches.
Steinar H. Gunderson [Sat, 10 Oct 2020 09:36:17 +0000 (11:36 +0200)]
Support case-insensitive searches.

Without changing the database format, this causes a bunch of extra
lookups. But somehow, it appears to go fairly well in practice.
Of course, case-sensitive will always be faster.

3 years agoGeneralize the sort+unique+erase pattern into unique_sort().
Steinar H. Gunderson [Sat, 10 Oct 2020 08:35:41 +0000 (10:35 +0200)]
Generalize the sort+unique+erase pattern into unique_sort().

3 years agoRemove the double filtering of too large posting lists; we would not even start I...
Steinar H. Gunderson [Sat, 10 Oct 2020 08:33:36 +0000 (10:33 +0200)]
Remove the double filtering of too large posting lists; we would not even start I/O for it anyway, so there is less to save than was assumed.

3 years agoBetter printing of trigrams in debug messages, especially with non-ASCII characters.
Steinar H. Gunderson [Fri, 9 Oct 2020 22:52:26 +0000 (00:52 +0200)]
Better printing of trigrams in debug messages, especially with non-ASCII characters.

Also ends up setting locale, which we'll be needing soon.

3 years agoFull scans (not trigram-based) would always print counts, even without -c. Fix.
Steinar H. Gunderson [Fri, 9 Oct 2020 21:48:56 +0000 (23:48 +0200)]
Full scans (not trigram-based) would always print counts, even without -c. Fix.

3 years agoclang-format again (IWYU and clang-format seemingly disagree).
Steinar H. Gunderson [Fri, 9 Oct 2020 08:06:00 +0000 (10:06 +0200)]
clang-format again (IWYU and clang-format seemingly disagree).

3 years agoRun include-what-you-use.
Steinar H. Gunderson [Thu, 8 Oct 2020 22:09:33 +0000 (00:09 +0200)]
Run include-what-you-use.

3 years agoMove TurboPFor compilation to its own compilation unit.
Steinar H. Gunderson [Thu, 8 Oct 2020 21:57:23 +0000 (23:57 +0200)]
Move TurboPFor compilation to its own compilation unit.

This file takes so long to compile, especially with optimization
and/or ASan on, that it became a real annoyance whenever we were
modifying plocate.cpp for anything else. Takes away some genericness
we don't really use.

We could do the same thing with the encoder if need be.

3 years agoclang-format.
Steinar H. Gunderson [Thu, 8 Oct 2020 21:51:55 +0000 (23:51 +0200)]
clang-format.

3 years agoFix a harmless memory leak in plocate-build.
Steinar H. Gunderson [Thu, 8 Oct 2020 20:52:47 +0000 (22:52 +0200)]
Fix a harmless memory leak in plocate-build.

3 years agoFix some Valgrind issues in plocate-build.
Steinar H. Gunderson [Thu, 8 Oct 2020 20:51:04 +0000 (22:51 +0200)]
Fix some Valgrind issues in plocate-build.

3 years agoMake the searcher ASan-clean.
Steinar H. Gunderson [Thu, 8 Oct 2020 20:41:41 +0000 (22:41 +0200)]
Make the searcher ASan-clean.

Allocate 16 bytes extra as slop after every read buffer, so that
we know we never read outside allocated memory. (This is much easier
now that we have a TurboPFor implementation with clearly defined slop.)

3 years agoUnbreak runs with no --limit.
Steinar H. Gunderson [Thu, 8 Oct 2020 20:35:28 +0000 (22:35 +0200)]
Unbreak runs with no --limit.

3 years agoDocument slop requirements for TurboPFor decoding.
Steinar H. Gunderson [Thu, 8 Oct 2020 20:23:40 +0000 (22:23 +0200)]
Document slop requirements for TurboPFor decoding.

3 years agoImplement the --limit option.
Steinar H. Gunderson [Thu, 8 Oct 2020 18:49:30 +0000 (20:49 +0200)]
Implement the --limit option.

3 years agoImplement the --count option.
Steinar H. Gunderson [Thu, 8 Oct 2020 18:05:41 +0000 (20:05 +0200)]
Implement the --count option.

3 years agoSwitch build systems to Meson.
Steinar H. Gunderson [Thu, 8 Oct 2020 08:09:56 +0000 (10:09 +0200)]
Switch build systems to Meson.

3 years agoFix some warnings found by Clang.
Steinar H. Gunderson [Wed, 7 Oct 2020 23:12:48 +0000 (01:12 +0200)]
Fix some warnings found by Clang.

3 years agoclang-format again.
Steinar H. Gunderson [Wed, 7 Oct 2020 23:05:30 +0000 (01:05 +0200)]
clang-format again.

3 years agoSwitch to our own TurboPFor encoder.
Steinar H. Gunderson [Wed, 7 Oct 2020 23:01:55 +0000 (01:01 +0200)]
Switch to our own TurboPFor encoder.

This is much slower (plocate-build becomes ~6% slower or so),
but allows us to ditch the external TurboPFor dependency entirely,
and with it, the SSE4.1 demand. This should make us much more palatable
for most distributions.

The benchmark program is extended with some tests that all posting lists
in plocate.db round-trip properly through our encoder, which found a
lot of bugs during development.

3 years agoRemove unneeded vp4.h #include from plocate.cpp.
Steinar H. Gunderson [Wed, 7 Oct 2020 22:42:09 +0000 (00:42 +0200)]
Remove unneeded vp4.h #include from plocate.cpp.

3 years agoMake the builder delta-encode posting lists as we go.
Steinar H. Gunderson [Tue, 6 Oct 2020 20:54:44 +0000 (22:54 +0200)]
Make the builder delta-encode posting lists as we go.

It's slightly faster, and puts less complexity load on the encoder.

3 years agoRun clang-format.
Steinar H. Gunderson [Tue, 6 Oct 2020 20:46:46 +0000 (22:46 +0200)]
Run clang-format.

3 years agoFix a warning.
Steinar H. Gunderson [Tue, 6 Oct 2020 20:20:14 +0000 (22:20 +0200)]
Fix a warning.

3 years agoHand-roll zeroing of destination docids for SSE2; takes us seemingly up from ~84...
Steinar H. Gunderson [Tue, 6 Oct 2020 20:20:00 +0000 (22:20 +0200)]
Hand-roll zeroing of destination docids for SSE2; takes us seemingly up from ~84% to ~89% of reference.

3 years agoFix 32-bit compile (without -msse2).
Steinar H. Gunderson [Tue, 6 Oct 2020 20:01:22 +0000 (22:01 +0200)]
Fix 32-bit compile (without -msse2).

3 years agoMove exception shifting to later; allows us to get it into SSE2.
Steinar H. Gunderson [Tue, 6 Oct 2020 19:45:03 +0000 (21:45 +0200)]
Move exception shifting to later; allows us to get it into SSE2.

3 years agoUnroll and specialize decode_bitmap_sse2().
Steinar H. Gunderson [Tue, 6 Oct 2020 19:30:08 +0000 (21:30 +0200)]
Unroll and specialize decode_bitmap_sse2().

By asking GCC to unroll the loop, and specializing for the bit width
using templatizing, we can get rid of a lot of the control overhead.
This takes us up from 60% to 80% of reference performance, still
without requiring anything more than SSE2.

3 years agoFix undefined behavior when bit_width == 32.
Steinar H. Gunderson [Tue, 6 Oct 2020 19:27:05 +0000 (21:27 +0200)]
Fix undefined behavior when bit_width == 32.

3 years agoMove SSE2 bit reader functionality out into a class.
Steinar H. Gunderson [Tue, 6 Oct 2020 18:58:19 +0000 (20:58 +0200)]
Move SSE2 bit reader functionality out into a class.

3 years agoFuse delta decoding into the decoding loops where appropriate.
Steinar H. Gunderson [Tue, 6 Oct 2020 18:48:17 +0000 (20:48 +0200)]
Fuse delta decoding into the decoding loops where appropriate.