Add a simple tuple optimization pass #5937

kripken · 2023-09-13T22:49:05Z

In some cases tuples are obviously not needed, such as when they are only used
in local operations and make/extract. Such tuples are not used as return values or
in control flow structures, so we might as well lower them to individual locals per
lane, which other passes can optimize a lot better.

I believe LLVM does the same with its own tuples: it lowers them as much as
possible, leaving only necessary ones.

Fixes #5923

cc @askeksa

codecov · 2023-09-13T23:03:21Z

Codecov Report

Merging #5937 (9cd8e92) into main (11dba9b) will increase coverage by 0.06%.
The diff coverage is 93.45%.

@@            Coverage Diff             @@
##             main    #5937      +/-   ##
==========================================
+ Coverage   42.61%   42.67%   +0.06%     
==========================================
  Files         484      485       +1     
  Lines       74831    74938     +107     
  Branches    11922    11953      +31     
==========================================
+ Hits        31886    31983      +97     
- Misses      39750    39754       +4     
- Partials     3195     3201       +6

Files Changed	Coverage Δ
src/passes/TupleOptimization.cpp	`93.26% <93.26%> (ø)`
src/passes/pass.cpp	`83.45% <100.00%> (-0.12%)`	⬇️

... and 3 files with indirect coverage changes

tlively

Nice! This will definitely be helpful.

It might be good to additionally test that sets of multivalue function returns inhibit the optimization.

tlively · 2023-09-14T17:39:04Z

src/passes/TupleOptimization.cpp

+      if (auto* set = value->dynCast<LocalSet>()) {
+        assert(set->isTee());
+        validUses[set->index]++;
+        validUses[curr->index]++;
+        copiedIndexes[set->index].insert(curr->index);
+        copiedIndexes[curr->index].insert(set->index);


Suggested change

if (auto* set = value->dynCast<LocalSet>()) {

assert(set->isTee());

validUses[set->index]++;

validUses[curr->index]++;

copiedIndexes[set->index].insert(curr->index);

copiedIndexes[curr->index].insert(set->index);

if (auto* tee = value->dynCast<LocalSet>()) {

assert(tee->isTee());

validUses[tee->index]++;

validUses[curr->index]++;

copiedIndexes[tee->index].insert(curr->index);

copiedIndexes[curr->index].insert(tee->index);

Instead of setting copiedIndexes twice, how about maintaining the invariant that the smaller index is always the key and setting it just once?

This isn't a set of tuples (x, y) that we can store by flipping them. We are given an index and need to find all related indexes to it - that is, we know one of x, y and want to know the other (and it is also a set of others, but that's separate).

Right, so if we need to store the edges (0, 1), (1, 2), (0, 3), the current code would have:

{ 0: {1, 3}, 1: {0, 2}, 2: {1}, 3: {0}, }

I'm suggesting that instead we construct this mapping:

{ 0: {1, 3}, 1: {2}, }

From the comment mentioning that this is a bidirectional mapping, this is what I would have expected.

Sorry, I'm missing something. Say I have your mapping, and I get the index "2". I want to find the other indexes I need to mark as bad. How do I do that?

You would iterate through the map and for each bad key, you would add all the corresponding values to the work list and for each bad value, you would add the corresponding key to the work list. This may not be simpler overall!

Ah, but that's O(map size) for each index we realize is bad? It's O(num related indexes) in the code atm. So I worry changing this would be a regression, though it would save some memory otoh. For now though I think this is good enough.

tlively · 2023-09-14T17:45:46Z

src/passes/TupleOptimization.cpp

+    // right after it, depending on the tuple size.
+    std::unordered_map<Index, Index> tupleToNewBaseMap;
+    for (Index i = 0; i < good.size(); i++) {
+      if (good[i]) {


It would be nice to early return if !good[i] here.

tlively · 2023-09-14T17:48:04Z

src/passes/TupleOptimization.cpp

+            // This must be right after the former.
+            assert(newIndex == lastNewIndex + 1);
+          }
+          lastNewIndex = newIndex;


With the if-else and lastNewIndex, this seems like a lot of machinery for some assertions. Is it worth it?

Maybe it is slightly excessive, but it guards against us modifying how we store locals or how we allocate new ones in addVar (like maybe we'll have a freelist?). Without these assertions a bug might be subtle, I worry.

tlively · 2023-09-14T18:03:24Z

src/passes/TupleOptimization.cpp

+    // identify the local that was tee'd, so we know what to get (which has been
+    // replaced by the block). To make that simple keep a map of the things that
+    // replaced tees.
+    std::unordered_map<Expression*, LocalSet*> teeReplacements;


Maybe replacedTees instead of teeReplacements? I think we usually name maps for their values.

tlively · 2023-09-14T18:18:21Z

src/passes/pass.cpp

+  if (wasm->features.hasMultivalue()) {
+    addIfNoDWARFIssues("tuple-optimization");
+  }


Are we putting this here because it is just before all the local optimizations?

Sort of, and also after at least one optimize-instructions. I can add a comment.

It will also be useful to do this after inlining, if that's not already the case.

Yes, already the case: inlining-optimizing runs the full optimizer pipeline, which includes this.

tlively · 2023-09-14T18:29:17Z

test/lit/passes/tuple-optimization.wast

+  ;; CHECK-NEXT:   (local.get $11)
+  ;; CHECK-NEXT:  )
+  ;; CHECK-NEXT: )
+  (func $chain-3


It might be interesting to have a test case that copies a 2-tuple and adds another element to create a 3-tuple or that goes the other directly by dropping an element.

Co-authored-by: Thomas Lively <tlively@google.com>

kripken · 2023-09-14T19:46:01Z

Thanks, feedback addressed + tests added.

tlively

LGTM % possibly changing how we store the mapping if it's a simplification overall.

kripken · 2023-09-14T21:20:59Z

We do not have very deep multivalue fuzzing, I worry, but I did fuzz it overnight on this.

In some cases tuples are obviously not needed, such as when they are only used in local operations and make/extract. Such tuples are not used as return values or in control flow structures, so we might as well lower them to individual locals per lane, which other passes can optimize a lot better. I believe LLVM does the same with its own tuples: it lowers them as much as possible, leaving only necessary ones. Fixes WebAssembly#5923

kripken added 30 commits September 13, 2023 09:55

start

eead1ca

work

28f682d

work

2d5008f

work

6b9e063

work

cf54d39

work

0a576b9

work

0330419

work

2cbffd9

work

28e1f46

work

b9789a5

work

9b776f0

work

2f45444

work

377a7b3

fix

8e55f0c

fix

554d6e2

fix

52a98d7

fix

ddd2993

fix

9c43ac7

work

d80db80

fix

b06d868

fix

148d5c4

fix

44b2dde

fix

9b9eef0

fix

8841eed

fix

59ba402

fix

4cfb2bb

fix

180a98e

fix

684a01e

fix

ed20ca8

fix

6d1089b

kripken added 9 commits September 13, 2023 15:35

more

3917cac

more

e5bd82b

more

0d34707

more

48af11d

more

60829ba

more

3b5ea6a

more

2ece399

more

7af5549

rename

57c7c43

kripken requested a review from tlively September 13, 2023 22:49

kripken mentioned this pull request Sep 13, 2023

Tuples inhibiting optimizations #5923

Closed

tlively reviewed Sep 14, 2023

View reviewed changes

kripken and others added 10 commits September 14, 2023 12:29

Merge remote-tracking branch 'origin/main' into tuple.opt

902e0d3

add.test

c538850

Update src/passes/TupleOptimization.cpp

2640b2a

Co-authored-by: Thomas Lively <tlively@google.com>

Merge remote-tracking branch 'origin/tuple.opt' into tuple.opt

8fd65be

feedback

9dc8512

feedback

f736d2f

feedback

0230bed

fix

a53d3f5

test

958ed9a

test

9cd8e92

tlively approved these changes Sep 14, 2023

View reviewed changes

kripken merged commit 3e8a9da into main Sep 14, 2023

kripken deleted the tuple.opt branch September 14, 2023 21:21

kripken mentioned this pull request Dec 6, 2023

Enabling features by default in LLVM WebAssembly/tool-conventions#158

Closed

Add a simple tuple optimization pass #5937

Add a simple tuple optimization pass #5937

Uh oh!

Conversation

kripken commented Sep 13, 2023

Uh oh!

codecov bot commented Sep 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken commented Sep 14, 2023

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

kripken commented Sep 14, 2023

Uh oh!

Uh oh!

codecov bot commented Sep 13, 2023 •

edited

Loading

kripken Sep 14, 2023 •

edited

Loading