Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inheriting from a frozen class causes instantiated child hashes to collide #528

Open
ejohb opened this issue May 3, 2019 · 6 comments
Open
Labels

Comments

@ejohb
Copy link

ejohb commented May 3, 2019

This works as expected:

class A:
    pass

class B(A):
    pass

class C(A):
    pass

assert hash(B) != hash(C)
assert hash(B()) != hash(C())

=========================== 1 passed in 3.39 seconds ===========================

This errors:

import attr

@attr.s(frozen=True)
class A:
    pass

class B(A):
    pass

class C(A):
    pass

assert hash(B) != hash(C)
assert hash(B()) != hash(C())

>       assert hash(B()) != hash(C())
E       AssertionError: assert 7618298384354620882 != 7618298384354620882
E        +  where 7618298384354620882 = hash(B())
E        +    where B() = <class 'test_attr_bug.<locals>.B'>()
E        +  and   7618298384354620882 = hash(C())
E        +    where C() = <class 'test_attr_bug.<locals>.C'>()

So these two instantiated classes hash to the same value despite a) being instances of different classes and b) being of classes which, when uninstantiated, have differing hashes.

This seems very wrong to me!

@ejohb ejohb changed the title Inheriting from a frozen class causes instantiated child hashed to collide Inheriting from a frozen class causes instantiated child hashes to collide May 3, 2019
@wsanchez wsanchez added the Bug label May 3, 2019
@gabbard
Copy link
Member

gabbard commented May 7, 2019

@EdwardJB : The behavior seems more general than this.

import attr
@attr.s(hash=True)
class A:
    pass

@attr.s(hash=True)
class B:
    pass

hash(A) == hash(B)
Out[5]: False

hash(A()) == hash(B())
Out[6]: True

If you look at how the hash is computed, the type appears not to be used anywhere (despite the name type_hash being used).

While the behavior might seem a little odd, it isn't wrong (since there are no hard guarantees about hash code collisions even for objects of the same type) and it is difficult for me to imagine a case where it has significant performance implications.

(to compare to a similar library, sampling some generated code from Java's Immutables indicates they do not incorporate the class into the hash code either)

@gabbard
Copy link
Member

gabbard commented May 7, 2019

It seems good to at least rename type_hash to a less misleading name in the code, though.

@gabbard
Copy link
Member

gabbard commented May 7, 2019

And the footnote on http://www.attrs.org/en/stable/hashing.html#id1 which says "The hash is computed by hashing a tuple that consists of an unique id for the class plus all attribute values" doesn't match the actual behavior.

@ejohb
Copy link
Author

ejohb commented May 7, 2019

And the footnote on http://www.attrs.org/en/stable/hashing.html#id1 which says "The hash is computed by hashing a tuple that consists of an unique id for the class plus all attribute values" doesn't match the actual behavior.

That's what I was relying on. Can you guys add an include_type_in_hash argument?

@gabbard
Copy link
Member

gabbard commented May 7, 2019

@EdwardJB : well, that's up to @hynek , etc. I'm just a random person. :-) . But I'm a little curious what your use case is where this matters? The only impact would be increasing the collision rate for collections which contain lots of objects which have identical content except for their type. I'm having a hard time imagining a circumstance which would require this?

@hynek
Copy link
Member

hynek commented May 8, 2019

Multiple things:

  • It is indeed OK for hashes to collide, see my blog post on that topic for details. That makes this kind of a feature request, not a bug. But I'm not gonna discuss semantics here. :)
  • I was thinking on removing the method caching anyways because it seems a fringe problem (saves some ram and setuptime when you create lots of identical but different classes) that introduces unnecessary complexity. This might be another instance where that might come handy.
  • If someone can find a way to remove the method caching and add a cached type hash instead that is mixed into the hash call, I would merge it (barring any unforseen problems).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants