-
Notifications
You must be signed in to change notification settings - Fork 786
Add a simple tuple optimization pass #5937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5937 +/- ##
==========================================
+ Coverage 42.61% 42.67% +0.06%
==========================================
Files 484 485 +1
Lines 74831 74938 +107
Branches 11922 11953 +31
==========================================
+ Hits 31886 31983 +97
- Misses 39750 39754 +4
- Partials 3195 3201 +6
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This will definitely be helpful.
It might be good to additionally test that sets of multivalue function returns inhibit the optimization.
src/passes/TupleOptimization.cpp
Outdated
if (auto* set = value->dynCast<LocalSet>()) { | ||
assert(set->isTee()); | ||
validUses[set->index]++; | ||
validUses[curr->index]++; | ||
copiedIndexes[set->index].insert(curr->index); | ||
copiedIndexes[curr->index].insert(set->index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (auto* set = value->dynCast<LocalSet>()) { | |
assert(set->isTee()); | |
validUses[set->index]++; | |
validUses[curr->index]++; | |
copiedIndexes[set->index].insert(curr->index); | |
copiedIndexes[curr->index].insert(set->index); | |
if (auto* tee = value->dynCast<LocalSet>()) { | |
assert(tee->isTee()); | |
validUses[tee->index]++; | |
validUses[curr->index]++; | |
copiedIndexes[tee->index].insert(curr->index); | |
copiedIndexes[curr->index].insert(tee->index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of setting copiedIndexes
twice, how about maintaining the invariant that the smaller index is always the key and setting it just once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a set of tuples (x, y)
that we can store by flipping them. We are given an index and need to find all related indexes to it - that is, we know one of x, y
and want to know the other (and it is also a set of others, but that's separate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so if we need to store the edges (0, 1), (1, 2), (0, 3), the current code would have:
{
0: {1, 3},
1: {0, 2},
2: {1},
3: {0},
}
I'm suggesting that instead we construct this mapping:
{
0: {1, 3},
1: {2},
}
From the comment mentioning that this is a bidirectional mapping, this is what I would have expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm missing something. Say I have your mapping, and I get the index "2". I want to find the other indexes I need to mark as bad. How do I do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would iterate through the map and for each bad key, you would add all the corresponding values to the work list and for each bad value, you would add the corresponding key to the work list. This may not be simpler overall!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but that's O(map size)
for each index we realize is bad? It's O(num related indexes)
in the code atm. So I worry changing this would be a regression, though it would save some memory otoh. For now though I think this is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
src/passes/TupleOptimization.cpp
Outdated
// right after it, depending on the tuple size. | ||
std::unordered_map<Index, Index> tupleToNewBaseMap; | ||
for (Index i = 0; i < good.size(); i++) { | ||
if (good[i]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to early return if !good[i]
here.
src/passes/TupleOptimization.cpp
Outdated
// This must be right after the former. | ||
assert(newIndex == lastNewIndex + 1); | ||
} | ||
lastNewIndex = newIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the if-else
and lastNewIndex
, this seems like a lot of machinery for some assertions. Is it worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is slightly excessive, but it guards against us modifying how we store locals or how we allocate new ones in addVar
(like maybe we'll have a freelist?). Without these assertions a bug might be subtle, I worry.
src/passes/TupleOptimization.cpp
Outdated
// identify the local that was tee'd, so we know what to get (which has been | ||
// replaced by the block). To make that simple keep a map of the things that | ||
// replaced tees. | ||
std::unordered_map<Expression*, LocalSet*> teeReplacements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe replacedTees
instead of teeReplacements
? I think we usually name maps for their values.
src/passes/pass.cpp
Outdated
if (wasm->features.hasMultivalue()) { | ||
addIfNoDWARFIssues("tuple-optimization"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we putting this here because it is just before all the local optimizations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of, and also after at least one optimize-instructions
. I can add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will also be useful to do this after inlining, if that's not already the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, already the case: inlining-optimizing
runs the full optimizer pipeline, which includes this.
;; CHECK-NEXT: (local.get $11) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: ) | ||
(func $chain-3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be interesting to have a test case that copies a 2-tuple and adds another element to create a 3-tuple or that goes the other directly by dropping an element.
Co-authored-by: Thomas Lively <tlively@google.com>
Thanks, feedback addressed + tests added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % possibly changing how we store the mapping if it's a simplification overall.
We do not have very deep multivalue fuzzing, I worry, but I did fuzz it overnight on this. |
In some cases tuples are obviously not needed, such as when they are only used in local operations and make/extract. Such tuples are not used as return values or in control flow structures, so we might as well lower them to individual locals per lane, which other passes can optimize a lot better. I believe LLVM does the same with its own tuples: it lowers them as much as possible, leaving only necessary ones. Fixes WebAssembly#5923
In some cases tuples are obviously not needed, such as when they are only used
in local operations and make/extract. Such tuples are not used as return values or
in control flow structures, so we might as well lower them to individual locals per
lane, which other passes can optimize a lot better.
I believe LLVM does the same with its own tuples: it lowers them as much as
possible, leaving only necessary ones.
Fixes #5923
cc @askeksa