Rotate by R bits at each step of NtHash #16
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As discussed before, rotating by 1 at each step of nthash cause the high bits of the bits of the hash to be very correlated. A larger rotation mitigates this issue (though some other bits will still be correlated), as suggested experimentally here: https://github.com/imartayan/pfp-distribution. I ended up choosing R=25 since:
1/ Along with 7, it's the value such that (R * n) mod 32 takes the longest to be equal to 1 or -1
2/ I think rotating right by 7 (i.e. left by 25) is slightly better when we only keep the upper 16 bits for minimizers, since the 7 lowest bits (which are truncated) will become the 7 highest for the next iteration.
This change will affect all hash values including the default ones, so we might want to mark this as a breaking change (e.g. with version 2.0), and in this case it would probably be relevant to move nthash to a separate crate in the same release (#6).