Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's going on with murmur3? #135

Open
warpfork opened this issue Feb 25, 2021 · 4 comments
Open

What's going on with murmur3? #135

warpfork opened this issue Feb 25, 2021 · 4 comments

Comments

@warpfork
Copy link
Contributor

This code looks an awful lot like it's doing a murmur3-32, and returning four bytes:

go-multihash/sum.go

Lines 146 to 154 in 6f1ea18

func sumMURMUR3(data []byte, length int) ([]byte, error) {
number := murmur3.Sum32(data)
bytes := make([]byte, 4)
for i := range bytes {
bytes[i] = byte(number & 0xff)
number >>= 8
}
return bytes, nil
}

And this code looks an awful lot like it's registering the above code as murmur3-128, which I suspect ought to return more than four bytes:

RegisterHashFunc(MURMUR3_128, sumMURMUR3)

Should I be worried?

And if this is a bug, how comfortable are we about fixing it vs potentially breaking things that rely on the current behavior?

@warpfork
Copy link
Contributor Author

I guess 1b82edf96afa4 is relevant, but I'm still sort of confused. Something with "128" in the name returning 4 bytes is... unexpected.

@Stebalien
Copy link
Member

Yeah, this seems wrong. I'm not sure what's going on here.

@Stebalien
Copy link
Member

I believe we should be using Sum64 (which is also not 128 but it's what go-unixfs uses). Honestly, we should never use this hash function for anything.

@aschmahmann
Copy link
Contributor

aschmahmann commented Mar 5, 2021

I did a bit of poking around and here's what I've got.

murmur3

UnixFS Sharding

Plan I think we should go with:

  • Figure out what we're actually using in UnixFS
  • If we can fix or throw some definitions into UnixFS that make the hash function actually match mumur3-x64-128
  • If not then we have a tough decision to make about if we need to redefine the code
  • In the meanwhile in order to merge Refactor registry system: no direct dependencies; expose standard hash.Hash; be a data carrier. #136, I think we should just remove the murmur3 hash entirely
  • I really really hope no one is using this murmur3 implementation, but we can try and do some spelunking to warn them. This implementation isn't really valid since UnixFS was AFAICT the first user and so the definition is supposed to match that one. It definitely hasn't been valid since we clarified murmer3 -> murmer3-(x64)-128.

Note: I'll put up some code on the endianness + test vectors when I get a chance. Need another pair of eyes to make sure I'm doing it correctly, and also need some context on UnixFS usage to make sure I'm understanding how the murmur3 hashes are being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants