From: Steinar H. Gunderson Date: Tue, 13 Oct 2020 15:46:20 +0000 (+0200) Subject: Use zstd dictionaries. X-Git-Tag: 1.0.3~8 X-Git-Url: https://git.sesse.net/?a=commitdiff_plain;h=15235ad9419e1db22838f6e228404baa3d78de14;hp=15235ad9419e1db22838f6e228404baa3d78de14;p=plocate Use zstd dictionaries. Since we have small strings, they can benefit from some shared context, and zstd supports this. plocate-build now reads the mlocate database twice; the first pass samples 1000 random blocks, which it uses to train a 1 kB dictionary. (zstd recommends much larger dictionaries, but practical testing seems to indicate this doesn't help us much, and might actually be harmful.) We get ~20% slower builds and ~7% smaller .db files -- but more interestingly, linear search speed is up ~20% (which indicates that decompression in itself benefits more). We need to read the 1 kB dictionary, but it's practically free since it's stored next to the header and so small. This is a version bump (to version 1), so we're not forward-compatible, but we're backward-compatible (plocate still reads version 0 files just fine). Since we're adding more fields to the header anyway, we can add a new “max_version” field that allows for marking backwards-compatible changes in the future, ie., if plocate-build adds more information that plocate would like to use but that older plocate versions can simply ignore. ---