Description
Inspired by https://users.rust-lang.org/t/hash-prefix-collisions/71823/10?u=scottmcm
Hash::hash_slice
has a bunch of text clarifying that h.hash_slice(&[a, b]); h.hash_slice(&[c]);
is not guaranteed to be the same as h.hash_slice(&[a]); h.hash_slice(&[b, c]);
.
However, Hasher::write
is unclear whether that same rule applies to it. It's very clear that .write(&[a])
is not the same as .write_u8(a)
, but not whether the same sequence of bytes to write
is supposed to be the same thing, even if they're in different groupings, like h.write(&[a, b]); h.write(&[c]);
vs h.write(&[a]); h.write(&[b, c]);
.
This is important for the same kind of things as the VecDeque
example mentioned on hash_slice
. If I have a circular byte buffer, is it legal for its Hash
to just .write
the two parts? Or does it need to write_u8
all the individual bytes since two circular buffers should compare equal regardless of where the split happens to be?
Given that Hash for str
and Hash for [T]
are doing prefix-freedom already, it feels to me like write
should not be doing it again.
Also, our SipHasher
implementation is going out of its way to maintain the "different chunking of write
s is fine":
rust/library/core/src/hash/sip.rs
Lines 264 to 308 in 6bf3008
So it seems to me like this has been the expected behaviour the whole time. And if not, we should optimize SipHasher
to be faster.
cc #80303 which lead to this text in hash_slice
.