Skip to content

Add a simple tuple optimization pass #5937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 70 commits into from
Sep 14, 2023
Merged

Add a simple tuple optimization pass #5937

merged 70 commits into from
Sep 14, 2023

Conversation

kripken
Copy link
Member

@kripken kripken commented Sep 13, 2023

In some cases tuples are obviously not needed, such as when they are only used
in local operations and make/extract. Such tuples are not used as return values or
in control flow structures, so we might as well lower them to individual locals per
lane, which other passes can optimize a lot better.

I believe LLVM does the same with its own tuples: it lowers them as much as
possible, leaving only necessary ones.

Fixes #5923

cc @askeksa

@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Merging #5937 (9cd8e92) into main (11dba9b) will increase coverage by 0.06%.
The diff coverage is 93.45%.

@@            Coverage Diff             @@
##             main    #5937      +/-   ##
==========================================
+ Coverage   42.61%   42.67%   +0.06%     
==========================================
  Files         484      485       +1     
  Lines       74831    74938     +107     
  Branches    11922    11953      +31     
==========================================
+ Hits        31886    31983      +97     
- Misses      39750    39754       +4     
- Partials     3195     3201       +6     
Files Changed Coverage Δ
src/passes/TupleOptimization.cpp 93.26% <93.26%> (ø)
src/passes/pass.cpp 83.45% <100.00%> (-0.12%) ⬇️

... and 3 files with indirect coverage changes

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This will definitely be helpful.

It might be good to additionally test that sets of multivalue function returns inhibit the optimization.

Comment on lines 126 to 131
if (auto* set = value->dynCast<LocalSet>()) {
assert(set->isTee());
validUses[set->index]++;
validUses[curr->index]++;
copiedIndexes[set->index].insert(curr->index);
copiedIndexes[curr->index].insert(set->index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (auto* set = value->dynCast<LocalSet>()) {
assert(set->isTee());
validUses[set->index]++;
validUses[curr->index]++;
copiedIndexes[set->index].insert(curr->index);
copiedIndexes[curr->index].insert(set->index);
if (auto* tee = value->dynCast<LocalSet>()) {
assert(tee->isTee());
validUses[tee->index]++;
validUses[curr->index]++;
copiedIndexes[tee->index].insert(curr->index);
copiedIndexes[curr->index].insert(tee->index);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of setting copiedIndexes twice, how about maintaining the invariant that the smaller index is always the key and setting it just once?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a set of tuples (x, y) that we can store by flipping them. We are given an index and need to find all related indexes to it - that is, we know one of x, y and want to know the other (and it is also a set of others, but that's separate).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so if we need to store the edges (0, 1), (1, 2), (0, 3), the current code would have:

{
  0: {1, 3},
  1: {0, 2},
  2: {1},
  3: {0},
}

I'm suggesting that instead we construct this mapping:

{
  0: {1, 3},
  1: {2},
}

From the comment mentioning that this is a bidirectional mapping, this is what I would have expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm missing something. Say I have your mapping, and I get the index "2". I want to find the other indexes I need to mark as bad. How do I do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would iterate through the map and for each bad key, you would add all the corresponding values to the work list and for each bad value, you would add the corresponding key to the work list. This may not be simpler overall!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but that's O(map size) for each index we realize is bad? It's O(num related indexes) in the code atm. So I worry changing this would be a regression, though it would save some memory otoh. For now though I think this is good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

// right after it, depending on the tuple size.
std::unordered_map<Index, Index> tupleToNewBaseMap;
for (Index i = 0; i < good.size(); i++) {
if (good[i]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to early return if !good[i] here.

// This must be right after the former.
assert(newIndex == lastNewIndex + 1);
}
lastNewIndex = newIndex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the if-else and lastNewIndex, this seems like a lot of machinery for some assertions. Is it worth it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is slightly excessive, but it guards against us modifying how we store locals or how we allocate new ones in addVar (like maybe we'll have a freelist?). Without these assertions a bug might be subtle, I worry.

// identify the local that was tee'd, so we know what to get (which has been
// replaced by the block). To make that simple keep a map of the things that
// replaced tees.
std::unordered_map<Expression*, LocalSet*> teeReplacements;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replacedTees instead of teeReplacements? I think we usually name maps for their values.

Comment on lines 564 to 566
if (wasm->features.hasMultivalue()) {
addIfNoDWARFIssues("tuple-optimization");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we putting this here because it is just before all the local optimizations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of, and also after at least one optimize-instructions. I can add a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also be useful to do this after inlining, if that's not already the case.

Copy link
Member Author

@kripken kripken Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, already the case: inlining-optimizing runs the full optimizer pipeline, which includes this.

;; CHECK-NEXT: (local.get $11)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
(func $chain-3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be interesting to have a test case that copies a 2-tuple and adds another element to create a 3-tuple or that goes the other directly by dropping an element.

@kripken
Copy link
Member Author

kripken commented Sep 14, 2023

Thanks, feedback addressed + tests added.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % possibly changing how we store the mapping if it's a simplification overall.

@kripken
Copy link
Member Author

kripken commented Sep 14, 2023

We do not have very deep multivalue fuzzing, I worry, but I did fuzz it overnight on this.

@kripken kripken merged commit 3e8a9da into main Sep 14, 2023
@kripken kripken deleted the tuple.opt branch September 14, 2023 21:21
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
In some cases tuples are obviously not needed, such as when they are only used
in local operations and make/extract. Such tuples are not used as return values or
in control flow structures, so we might as well lower them to individual locals per
lane, which other passes can optimize a lot better.

I believe LLVM does the same with its own tuples: it lowers them as much as
possible, leaving only necessary ones.

Fixes WebAssembly#5923
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tuples inhibiting optimizations
2 participants