-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching hashes #591
Comments
Thanks for opening this, this use case makes sense to me. I think we should be able to implement |
I feel like I should expand on this a bit more:
We have some class A(msgspec.Struct):
name: str
position: int
class B(msgspec.Struct):
name: str
offset: int
assert hash(A("example", 0)) == hash(B("example", 0)) The two hashes are identical, which is not ideal. It's not a hard requirement that two objects that are not equal have different hashes, but it's something we try to avoid for performance reasons. We've gotten around this with custom |
Makes sense. Using disparate types as keys all in the same set/dict is not a use case I had in mind when I wrote the original hash implementation, but it's one we can support fairly easily. This should be fixed by #595 (I'll address |
Thank you for addressing this so quickly! I really appreciate it. |
This has now been released as part of version 0.18.5. |
Some quick benchmarks show that our tool is now 5-10% faster with msgspec than with attrs+pickle. Thank you again!! |
Description
The Use Case
We have an AST that needs to be serialized. Currently we're defining the AST classes using
attrs
and serializing it withpickle
, and we're in the middle of moving tomsgspec
for both aspects. (We really want the speed ofStruct
, and we need tagged unions.)When processing the AST, we do a lot of hashing. Hundreds of thousands of calls. This is easily solved by using
attrs
'scache_hash
option, cutting the number of__hash__()
calls by a significant amount and saving us a good chunk of time.The Feature
We'd really like msgspec to provide something like
cache_hash
. I think it should be easily implementable once private fields (#199) are available:_hash
(or similar) would be a private field that's calculated when the struct is initialized, and__hash__()
would just return_hash
. (Or it could be calculated the first time__hash__()
is called. Doesn't matter much, I think.)It would also be nice if this was usable with a custom
__hash__()
implementation, as the default implementation does not work well for us.I think this would be another configuration value, like
frozen
and the like. Alternatively, it could be enabled by default whenfrozen=True
.This is almost implementable by users already, except that frozen Structs can't be mutated (even in
__post_init__
) and the cached hash shouldn't be encoded/decoded.The text was updated successfully, but these errors were encountered: