]> git.sesse.net Git - plocate/log
plocate
3 years agoDeduplicate docids as we go.
Steinar H. Gunderson [Mon, 28 Sep 2020 07:34:31 +0000 (09:34 +0200)]
Deduplicate docids as we go.

This saves ~50% RAM in the build step, now that we have blocking
(there's a lot of deduplication going on), and seemingly also
~15% execution time, possibly because of less memory allocation
(I haven't checked thoroughly).

3 years agoCompress filenames with zstd.
Steinar H. Gunderson [Sun, 27 Sep 2020 22:28:49 +0000 (00:28 +0200)]
Compress filenames with zstd.

Make blocks of 32 and 32 filenames, and compress then with zstd -6
(the level is fairly arbitrarily chosen). This compresses the repetitive
path information very well, and also allows us to have shorter posting
lists, as they can point into the blocks (allowing dedup).

32 was chosen after eyeballing some compressed sizes, looking for
diminishing returns and then verifying it didn't cost much in terms
of search performance.

3 years agoIn build debug output, print the total size.
Steinar H. Gunderson [Sun, 27 Sep 2020 22:28:43 +0000 (00:28 +0200)]
In build debug output, print the total size.

3 years agoInitial checkin.
Steinar H. Gunderson [Sun, 27 Sep 2020 20:53:35 +0000 (22:53 +0200)]
Initial checkin.