Make short string hashing 30% faster by splitting Hash::hash_end from Hash::hash #29139
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why?
Since hash functions are already designed to prevent collisions between a string and its prefixes, it's somewhat inelegant that we append a sentinel byte to strings for hashing. I was looking at #25237 a few days ago and realized that if we distinguish hashing contexts which are at the end of the key from those that aren't, we can suppress the sentinel byte (and also vector lengths) in the cases where they aren't needed; in addition to saving a byte of hashing, it saves a call to update and associated buffer-management overhead.
This attacks the same problem as #28044.
How?
This adds a new method
hash_end
to theHash
trait, which behaves exactly ashash
except that it need not produce a prefix-free encoding. It is always legal forhash_end
to be the same ashash
, and as such this is the default implementation. There are specialized implementations for strings and slices which remove the end/length markers.How much?
Here's a small benchmark script:
I ran it in each mode on the patched and baseline rust compilers (with
-O
, on x86_64 OSX), median of 27 runs each time, for the following timings:Subtracting out the baseline (0) case which just allocates and frees strings, it looks like a 34% improvement on short string hashing, 13% on hashset queries, and 2% on hashset insertions. Uncertainty for the medians seems to be around 10ms.
What's the catch?
Borrow
implementers) can no longer generally do so by forwarding thehash
method;hash_end
must be forwarded as well. This situation exists exactly once in the compiler.#[derive(Hash)]
has been modified to forwardhash_end
, so newtype-ish wrappers will just work (outside of the compiler; the compiler needs to implementhash_end
itself when it's needed forBorrow
, because we can't rely on the stage0 to do it.)Wait!
hash_end
should probably be feature gated. It's not in this version of the patch, because when I tried feature gating it deriving broke; I'm not sure how to tell rustc to ignore feature gates in deriving-generated code.I'm not sure whether this belongs as an RFC.