Skip to content

typekey_hash: length-independent run time for hashing long Vararg #57795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

nsajko
Copy link
Contributor

@nsajko nsajko commented Mar 16, 2025

I'm not familiar with the C code, and I'm not sure this change is correct, but playing with it in the REPL doesn't reveal any obvious bugs.

Prior to this change, the run time of an operation like Tuple{Vararg{T, 1000000}} where {T} scales linearly with the field count (1000000 in this example). The goal of this change is to make the run time be independent from the field count.

The idea is to simply return the same hash for all field counts greater than or equal to a cut off.

Follows up on PR #50932, which introduced the loop linear in the field count.

I'm not familiar with the C code, and I'm not sure this change is
correct, but playing with it in the REPL doesn't reveal any obvious
bugs.

Prior to this change, the run time of an operation like
`Tuple{Vararg{T, 1000000}} where {T}` scales linearly with the field
count (1000000 in this example).  The goal of this change is to make
the run time be independent from the field count.

The idea is to simply return the same hash for all field counts greater
than or equal to a cut off.

Follows up on PR JuliaLang#50932, which introduced the loop linear in the field
count.
@nsajko nsajko added performance Must go faster types and dispatch Types, subtyping and method dispatch hashing labels Mar 16, 2025
@@ -1805,6 +1805,9 @@ static unsigned typekey_hash(jl_typename_t *tn, jl_value_t **key, size_t n, int
unsigned hashp = type_hash(p, &failed);
if (failed && !nofail)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea if I'm supposed handle failed or nofail somehow for this PR.

@nsajko
Copy link
Contributor Author

nsajko commented Mar 16, 2025

An alternative change I've been thinking about is to avoid the loop altogether when the Vararg length is greater than three. In that case we could simply bitmix the hashes for both the T and the N. Should be correct because of type normalization, I think?

@vtjnash
Copy link
Member

vtjnash commented Mar 17, 2025

Should be correct because of type normalization, I think?

Types are not normalized, so no, this PR is not safe. Just do not make Tuples this large and you'll be fine.

@vtjnash vtjnash closed this Mar 17, 2025
@nsajko
Copy link
Contributor Author

nsajko commented Mar 17, 2025

FTR this isn't about large tuples, that is, I never have tuple values that large, I just use types like Tuple{Vararg{T, n::Int}} where {T} to represent natural numbers as types.

@nsajko nsajko deleted the typekey_hash_performance_for_long_Vararg branch March 17, 2025 20:24
@vtjnash
Copy link
Member

vtjnash commented Mar 17, 2025

Int values are already the natural numbers as a type, and do not have these issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashing performance Must go faster types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants