-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8342040: Further improve entry lookup performance for multi-release JARs #21489
Open
eirbjo
wants to merge
8
commits into
openjdk:master
Choose a base branch
from
eirbjo:mr-jar-lookup-speedup
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+64
−63
Open
Changes from 3 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
f190251
Only check versions for the particular entry being looked up
eirbjo 4658139
Update copyright year
eirbjo b2c900d
Update field comment for metaVersions
eirbjo 826d78a
Use Arrays.sort instead of TreeSet
eirbjo ea9bc0c
Use BitSet to streamline construction
cl4es 34fa9ed
Fix traversal, traverse backwards to pick latest applicable version
cl4es 92acb20
Merge pull request #1 from cl4es/bitset_versions
eirbjo 771488e
Map versions by entry name hashcode instead of by entry name. This av…
eirbjo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you just require order upon final freezing, you can just use a HashSet or ArrayList, and perform the sorting and deduplication when you compile to an int array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we trust that versioned entries are rare, this should not matter much either way. But I pushed a commit which uses HashSet and Arrays::sort on freezing instead. WDYT?
Given that most versioned entries will only have a single version, we may save some memory and speed by special-casing for sets with one or two versions:
But again, versioned entries are rare, so perhaps better to keep it simple. Does @cl4es have thoughts about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like simple.
Mixing Set.of and HashSet means a lot of semantic fu-fu (throwing/not throwing on nulls) and risks being suboptimal due making some call sites megamorphic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, looks good. And I agree with Claes' evaluation that we should not complicate this construction further, especially that the result is quickly garbage collected when we compile into the string to int array map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just use
Map<String, int[]>
. This would allow us to skip the wholeSet<Integer>
toint[]
transition step, and the temporaryHashMap
is no longer needed.Deduplication could occur during registration:
while the postprocessing step would simply sort each array:
This performs ~10% better on a version-heavy
ZipFileOpen
, and I think the code is reasonably simple compared to the current state of the PR. Thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is a cost with the String construction when parsing the name out of
META-INF/versions/{name}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few possible strategies for avoiding that additional parse, since the effect we're getting at here is to have a quick filter to avoid pointless lookups and not necessarily an exact mapping.
One is to store the
checkedHash
rather than the fullString
. This getsopenCloseZipFile
down to ~910000 ns/op. (checkedHash
very hot in profiles). There's still a chance for redundant lookups on hash collisions, but this should be rare.Another is to store a
BitSet
per name length. This getsZipFileOpen
down to baseline level (~670000 ns/op), but increases chance of having to do redundant lookups a lot.Both also improves footprint (not keeping each versioned entry
String
in memory would be nice).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a resonable trade-off. Could you take a look at the latest 771488e and see if that represents your suggestion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 771488e matches my quick experiment (sans some cleanups you've made). I agree this variant makes for a reasonable trade-off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there's some benefit in detecting "META-INF/versions/" in the
checkAndAddEntry
method and then calculate the two different hash values back to back. Hash calculations are fast on modern hardware with vector support, but the memory accesses might not be so we might get a general win from re-arranging some of this stuff. I think this is more suitable for a follow-up experiment, though.