You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LibCompress: Speed up CanonicalCode::read_symbol() slow path
Symbols that need <= 8 bits hit a fast path as of #18075, but
the slow path has done a full binary search over all symbols
ever since this code was added in #2963. (#3405 even added a FIXME
for doing this, but #18075 removed it.)
Instead of doing a binary search over all codes for every single
bit read, this implements the Moffat-Turpin approach described at
https://www.hanshq.net/zip.html#huffdec, which only requires a
table read per bit.
hyperfine 'Build/lagom/bin/unzip ~/Downloads/enwik8.zip'
1.008 s ± 0.016 s => 957.7 ms ± 3.9 ms, 5% faster
Due to issue #25005, we can't peek the full 15 bits at once but
have to read them one-by-one. This makes the code look a bit
different than in the linked article.
I also tried not changing CanonicalCode::from_bytes() too much.
It does 15 passes over all symbols. I think it could do it in
a single pass instead. But that's for a future change.
No behavior change (other than slightly faster perf).
0 commit comments