Skip to content

Optimize tuple.extract of gets in BinaryInstWriter #5941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 14, 2023
Merged

Conversation

tlively
Copy link
Member

@tlively tlively commented Sep 14, 2023

In general, the binary lowering of tuple.extract expects that all the tuple
values are on top of the stack, so it inserts drops and possibly uses a scratch
local to ensure only the extracted value is left. However, when the extracted
tuple expression is a local.get, local.tee, or global.get, it's much more
efficient to change the lowering of the get or tee to ensure that only the
extracted value is on the stack to begin with. Implement that optimization in
the binary writer.

@tlively tlively requested a review from kripken September 14, 2023 21:08
@tlively
Copy link
Member Author

tlively commented Sep 14, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@codecov
Copy link

codecov bot commented Sep 14, 2023

Codecov Report

Merging #5941 (20d855e) into main (6dde22c) will decrease coverage by 0.01%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##             main    #5941      +/-   ##
==========================================
- Coverage   42.80%   42.79%   -0.01%     
==========================================
  Files         485      485              
  Lines       74943    74963      +20     
  Branches    11958    11965       +7     
==========================================
+ Hits        32076    32084       +8     
- Misses      39665    39670       +5     
- Partials     3202     3209       +7     
Files Changed Coverage Δ
src/wasm-stack.h 85.58% <ø> (+0.26%) ⬆️
src/wasm/wasm-stack.cpp 38.70% <50.00%> (+0.09%) ⬆️

... and 4 files with indirect coverage changes

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

)
)
(local.get $13)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement!

src/wasm-stack.h Outdated
@@ -148,6 +148,11 @@ class BinaryInstWriter : public OverriddenVisitor<BinaryInstWriter> {
InsertOrderedMap<Type, Index> scratchLocals;
void countScratchLocals();
void setScratchLocals();

// local.get and local.tee expressions that will be followed by
// tuple.extracts. We can optimize these by only getting only the local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// tuple.extracts. We can optimize these by only getting only the local
// tuple.extracts. We can optimize these by getting only the local

// We only need to get the single extracted value.
if (it->second == 0) {
o << int8_t(BinaryConsts::LocalTee)
<< U32LEB(mappedLocals[std::make_pair(curr->index, it->second)]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<< U32LEB(mappedLocals[std::make_pair(curr->index, it->second)]);
<< U32LEB(mappedLocals[std::make_pair(curr->index, 0)]);

o << int8_t(BinaryConsts::GlobalGet) << U32LEB(index + it->second);
return;
}
// Emit a global.get for each element if this is a tuple global
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, we allow tuple globals? Does the wasm spec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we weren't originally going to support tuple globals, but when we first implemented all this tuple business we found that some passes needed them. I don't remember which ones those were, though.

The Wasm spec supports tuple globals only as much as it supports tuple locals, i.e. not at all. But that's ok!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... a little odd, but ok I guess 😄 I suppose the passes would need to create more globals or such, otherwise.

@@ -2532,6 +2564,15 @@ void BinaryInstWriter::countScratchLocals() {
for (auto& [type, _] : scratchLocals) {
noteLocalType(type);
}
// While we have all the tuple.extracts, also find extracts of local.gets and
// local.tees that we can optimize.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment could mention global gets too.

(global.get $g2)
(tuple.extract 1
(global.get $g2)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have a test of a global.get of a tuple that does not flow into an extract?

@tlively tlively changed the base branch from main to move-mv-tests September 14, 2023 21:52
@tlively tlively force-pushed the better-tuple-extract branch from 952099a to 386b912 Compare September 14, 2023 21:52
@tlively tlively requested a review from kripken September 14, 2023 21:52
@tlively
Copy link
Member Author

tlively commented Sep 14, 2023

Rebased on top of a separate PR moving the tests to lit, which also made sure we are testing both an extract and non-extract use of a tuple global.get. The other comments have been addressed in the latest commit.

o << int8_t(BinaryConsts::GlobalGet) << U32LEB(index + it->second);
return;
}
// Emit a global.get for each element if this is a tuple global
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... a little odd, but ok I guess 😄 I suppose the passes would need to create more globals or such, otherwise.

Base automatically changed from move-mv-tests to main September 14, 2023 22:34
In general, the binary lowering of tuple.extract expects that all the tuple
values are on top of the stack, so it inserts drops and possibly uses a scratch
local to ensure only the extracted value is left. However, when the extracted
tuple expression is a local.get, local.tee, or global.get, it's much more
efficient to change the lowering of the get or tee to ensure that only the
extracted value is on the stack to begin with. Implement that optimization in
the binary writer.
@tlively tlively force-pushed the better-tuple-extract branch from 386b912 to 20d855e Compare September 14, 2023 22:36
@tlively tlively enabled auto-merge (squash) September 14, 2023 22:37
@tlively tlively merged commit 56ce1ea into main Sep 14, 2023
@tlively tlively deleted the better-tuple-extract branch September 14, 2023 23:15
@kripken
Copy link
Member

kripken commented Sep 15, 2023

Fuzz bug:

(module
 (type $0 (func (result i64 f32 i32 i32 i32)))
 (type $1 (func (result f32)))
 (func $0 (type $0) (result i64 f32 i32 i32 i32)
  (local $3 (i64 f32 i32 i32 i32))
  (local $4 f32)
  (local.set $3
   (tuple.make
    (i64.const 0)
    (call $1)
    (i32.const 0)
    (i32.const 0)
    (i32.const 0)
   )
  )
  (local.set $4
   (tuple.extract 1
    (local.get $3)
   )
  )
  (tuple.make
   (i64.const 0)
   (f32.const 0)
   (i32.const 0)
   (i32.const 0)
   (i32.const 0)
  )
 )
 (func $1 (type $1) (result f32)
  (f32.convert_i32_s
   (i32.const 0)
  )
 )
)

STR:

$ bin/wasm-opt w.wat  --gufa-cast-all  --generate-stack-ir --optimize-stack-ir --roundtrip --shrink-level=1 -all
[wasm-validator error in function 0] local.set's value type must be correct, on 
(local.set $2
 (i32.const 0)
)
Fatal: error after opts

@tlively
Copy link
Member Author

tlively commented Sep 15, 2023

Oof, I'm not sure how we should handle locals that have been optimized out in the stack IR, actually. Do you have an idea?

tlively added a commit that referenced this pull request Sep 15, 2023
tlively added a commit that referenced this pull request Sep 15, 2023
…5945)

This reverts commit 56ce1ea. The binary writer
optimization is not always correct when stack IR optimizations have run. Revert
the change until we can fix it.
@kripken
Copy link
Member

kripken commented Sep 18, 2023

I think it might be tricky to combine the two. Really the tuples should be lowered before Stack IR (is in Poppy IR), to make this clean. But for now, how about disabling local2stack on tuple locals in StackIR? The benefit of that optimization is around 1-2% at best, while your benefit here is quite large.

radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
In general, the binary lowering of tuple.extract expects that all the tuple
values are on top of the stack, so it inserts drops and possibly uses a scratch
local to ensure only the extracted value is left. However, when the extracted
tuple expression is a local.get, local.tee, or global.get, it's much more
efficient to change the lowering of the get or tee to ensure that only the
extracted value is on the stack to begin with. Implement that optimization in
the binary writer.
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
…ly#5941)" (WebAssembly#5945)

This reverts commit 56ce1ea. The binary writer
optimization is not always correct when stack IR optimizations have run. Revert
the change until we can fix it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants