Skip to content

fix parsing of GOOG128 type (len>=26) #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

fix parsing of GOOG128 type (len>=26) #18

wants to merge 1 commit into from

Conversation

jsulmont
Copy link
Contributor

Hi
Google user identifier seen at the moment on AdEx are 27 char.
I am not sure about potential side effects of this -- here's what Jeremy said when discussing the topic on Skype:

"there are some implications for us internally to change it, as it changes the value of hash() and we have files serialized with those hashes"

But I don't see why, since the hash should be computed on the internal 128 bits.

Cheers

@jeremybarnes
Copy link
Contributor

The reason that it could cause a problem is that:

  • Previously, if we had a 27 character Google ID, it would be stored as a string, and its hash would have been cityHashString(str).
  • With this patch, it will be stored as a 128 bit integer, and the hash will be cityHash128(bits)

Those hashes will be different. Now imagine a memory mapped hash table that had been previously serialized under the old hashes. An attempt to search for it using the new hash will not find the entry, since it was under the previous hash.

That being said, this pull request should be merged; we will just need some time to manage the fallout before we do so.

@jsulmont
Copy link
Contributor Author

jsulmont commented Jul 3, 2013

Yes. Actually the hash lookup was fail because the internal types are different (STR vs GOOG128).
Since when is Google sending 27 character Google ID? I was under the impression that this type wasn't used ..

On Jul 2, 2013, at 6:33 PM, Jeremy Barnes notifications@github.com wrote:

The reason that it could cause a problem is that:

Previously, if we had a 27 character Google ID, it would be stored as a string, and its hash would have been cityHashString(str).
With this patch, it will be stored as a 128 bit integer, and the hash will be cityHash128(bits)
Those hashes will be different. Now imagine a memory mapped hash table that had been previously serialized under the old hashes. An attempt to search for it using the new hash will not find the entry, since it was under the previous hash.

That being said, this pull request should be merged; we will just need some time to manage the fallout before we do so.


Reply to this email directly or view it on GitHub.

@ghost ghost assigned jeremybarnes Jul 19, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants