-
Notifications
You must be signed in to change notification settings - Fork 84
Make set hashing robust to bitwise XOR cancellations #20
Conversation
The way sets (maps, structs, slices with the "set" tag) are hashed relies on an XOR to mix the hashes of their elements. This allows multiple independent items to interfere with each other through XOR operations. For example, consider a "set" slice with the following two structs: [ {Key: "a", Value: "v1"}, {Key: "a", Value: "v2"} ] Changing both keys from "a" to "b" will not change the hash value of the overall object, since in either case the two keys will cancel out under XOR arithmetic. This change fixes the problem by adding a step after every set of unordered hash operations that "hardens" the result by applying another hash. It will prevent components of one unordered hashing operation (for example, `Key: "a"` above) from interacting with the same data encountered later in a different context. Note that this changes hashes produced by the package for given inputs. Closes: mitchellh#18
An alternative approach would be to replace the XOR entirely by using the following approach for unordered collections:
This has a minor performance cost, but it's easier to demonstrate the correctness, so it might be a better solution. I'd be fine with rewriting it that way. |
Thanks for doing this! Yeah, that XORing always felt... wrong, and I regret not spending more time thinking about the problem. In the back of my mind I always felt there was an issue there, so I'm glad you've articulated it so clearly and pointed it out. Changing the has could be problematic. One way we might want to do this is via semver and just bumping the major version and documenting that as the backwards incompatibility. Note I haven't reviewed the code at all yet, just thinking of the high level. Thoughts? |
I like the semver idea. I have a private fork of this package that uses cryptographic hashes instead of FNV, to prevent collisions. If you are open to bumping the major version, it's worth considering a switch from Also, since opening this pull request, I changed my fork to use the approach I proposed above with sorting hashes instead of using XOR at all. I like it a lot better. Let me know if you would like me to submit that version instead of reviewing what I have here. |
Almost every project at HashiCorp depends on hashstructure in some way, so let me reach out and talk to those teams and maybe get a few more eyes on this and if we feel comfortable with the SemVer change (I think its fine, we're switching to Go Modules anyways). |
@mitchellh Any updates for this PR? |
Hey @aaronlehmann, I'm finally going to merge this into master. We're going to tag a v2 since hash formats are changing by default. I think going forward we can add support for alternate hash formats including a |
Note that master (v2) also changed the |
The way sets (maps, structs, slices with the "set" tag) are hashed
relies on an XOR to mix the hashes of their elements. This allows
multiple independent items to interfere with each other through XOR
operations. For example, consider a "set" slice with the following two
structs:
Changing both keys from "a" to "b" will not change the hash value of the
overall object, since in either case the two keys will cancel out under
XOR arithmetic.
This change fixes the problem by adding a step after every set of
unordered hash operations that "hardens" the result by applying another
hash. It will prevent components of one unordered hashing operation (for
example,
Key: "a"
above) from interacting with the same dataencountered later in a different context.
Note that this changes hashes produced by the package for given inputs.
Closes: #18