Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use jump table for some switch statements #12978

Draft
wants to merge 33 commits into
base: develop
Choose a base branch
from

Conversation

chriseth
Copy link
Contributor

@chriseth chriseth commented May 2, 2022

No description provided.

@chriseth chriseth requested a review from ekpyron May 2, 2022 07:48
@philbawa

This comment was marked as spam.

@chriseth
Copy link
Contributor Author

chriseth commented May 5, 2022

This is looking really good already! We have to be careful to find a good balance between this being generic enough and not too complicated, but I would suggest the following items (that could maybe go into future pull requests):

  • implement this in optimized evm transform
  • use gas measurements and the "runs" parameter to decide whether or not to do the optimization
  • allow cases that are not exactly "enum-like" - for example it might not be such a big deal if some of the values are missing - we just use the same tag as the default case
  • think about how we could use this in the function dispatch - see Constant-complexity dispatcher idea #12650

@jaa2
Copy link
Contributor

jaa2 commented May 5, 2022

Item 3 is now completed. Any range of cases spanning no more than 16 values is now supported by subtracting the minimum case if necessary (underflows desired), and empty cases are allowed to be stored in the pushed value, which jump to the default case.

@chriseth
Copy link
Contributor Author

I'm not sure if that was already your reason for the relative offsets, but if we use relative offsets and can ensure them to be at most 256, we can actually fit 32 cases in a single push constant instead of just 16.

Also, @ekpyron suggested that as an alternative, we could also use a fully fixed-layout jump table instead, which would allow us to have no restriction on the size. The downside is that it needs another jump for each case and 4 additional bytes per case

let x :=<switch value>
if ge(x, <max value>) { /* something else */ }
jump(add(<start>, mul(x, <byte size of case>)))
// for each case:
jumpdest
pushtag <actual tag>
jump

@jaa2
Copy link
Contributor

jaa2 commented May 12, 2022

The primary reason I included relative offsets was to eliminate a conditional jump for gas savings, but you're right, we could fit up to 32 cases in a single push, especially when the cases are sparse. The reason I haven't done it that way yet is for architecture reasons: the jump table code itself is currently built in EVMCodeTransform, and I wanted to make the fewest assumptions about the addresses where the cases begin. For instance, if we were to handle 32 cases and a case's address is >= 256 from the default case, the compilation would fail when deciding the jump addresses because the relative offset would not fit in a single byte.

But the compilation could currently fail anyway with backward jumps because the offsets are relative. Those aren't possible with the unoptimized version since all non-default cases come after the default case, but maybe with the optimizer that will become an issue.

Is there a way I can shift the assembly code generation of the jump table to be done after the case addresses are known? That way we could then decide if we could fit 32 cases and also whether we need to handle backward jumps. The code size would vary, so I'm not sure how I would handle that properly, maybe by taking up the maximum possible size and using some instruction for no-ops if the actual size is less.

fully fixed-layout jump table

Ha! That was my first idea. It is still limited by the range of the cases (we can't fit cases spanning a range of 20,000) and uses more gas (59 rather than 35), but if you have a larger range of cases it could be useful.

case JumpTablePushTag:
{
unsigned pushSize = (unsigned) i.jumpTableTags().size() * bytesPerJumpTableTag;
if (pushSize == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen? Should we maybe rather assert that there's at least one tag already when constructing JumpTablePushTag items?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to write this in a way that was as permissive as possible, so technically a jump table with no cases would work here. But since there is not much use for a zero-case jump table, I have added asserts when constructing a JumpTablePushTag and when doing the final assembly.

}
case JumpTablePushTag:
{
// TODO: Do we need to look at the sub here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jump table tags should always refer to the current sub-assembly, shouldn't they? We may want to assert this here, but don't need more than that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was thinking, but I wasn't exactly sure of the sub-assembly architecture. Thank you for confirming that. I don't store the sub in the JumpTablePushTag anyway, so I just removed the comment.

_addChild(_switch.defaultCase);
for (auto const& [caseValue, caseBlock]: _switch.cases)
_addChild(caseBlock);
_addChild(_switch.target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note it here again: it's only possible to directly jump to the target here, if there is no default case, so if there is a default case, _switch should not contain a target.
Alternatively, you could reuse the same field in Switch for both and call it defaultCaseOrAfterSwitch (or just defaultCase with a comment that it will point to the block after the switch, in case there was no explicit default case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I stored the two separately to begin with is that in the most optimized jump table, the default case block must come before all the other case blocks, since offsets from the default case block are stored in the jump table. But I now realize that it's also possible to arrange the jump table the other way around, i.e. to require the default case block to come after all the other case blocks and instead of storing a positive offset, we store negative offsets in the jump table. This way, we can neatly keep the default case block and target together in the way you suggested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually hadn't considered that, but yeah :-)!

@@ -810,6 +921,46 @@ void StackLayoutGenerator::fillInJunk(CFG::BasicBlock const& _block)
},
[&](CFG::BasicBlock::FunctionReturn const&) {},
[&](CFG::BasicBlock::Terminated const&) {},
[&](CFG::BasicBlock::Switch const& _switch)
{
/*std::vector<CFG::BasicBlock*> exits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part will be less of a concern after #12132 (see #12132 (comment)). But it's not crucial to reproduce this for switches in a first implementation at all, so you're fine leaving this commented, resp. removing this and ignoring the fillInJunk part apart from the child visits below.

@@ -540,6 +583,47 @@ void StackLayoutGenerator::stitchConditionalJumps(CFG::BasicBlock const& _block)
},
[&](CFG::BasicBlock::FunctionReturn const&) {},
[&](CFG::BasicBlock::Terminated const&) { },
[&](CFG::BasicBlock::Switch const& _switch)
{
/*Stack exitLayout = info.exitLayout;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to keep this commented in, this part is crucial.

The commented code actually looks pretty much correct, apart from the fact that (as commented elsewhere), you don't jump to the target if there is a default case here.

@ekpyron
Copy link
Member

ekpyron commented Jun 16, 2022

For debugging the optimized evm code generation path, it may be good to know that you can visualize the control flow graphs and stack layouts by running our tests:

You need to have something installed that can consume graphviz dot files on stdin - I usually use xdot for that.

Then you can run (assuming a linux system, otherwise however you can set environment variables):

export ISOLTEST_DISPLAY_GRAPHS_ON_SUCCESS_COMMAND="xdot -"
export ISOLTEST_DISPLAY_GRAPHS_ON_FAILURE_COMMAND="xdot -"
build/test/tools/isoltest -t yulControlFlowGraph*

to get visualization of all control flow graph tests and

export ISOLTEST_DISPLAY_GRAPHS_SUCCESS="xdot -"
export ISOLTEST_DISPLAY_GRAPHS_FAILURE="xdot -"
build/test/tools/isoltest -t yulStackLayout*

to get visualization of all stack layout tests.
(It should actually be the same environment variables in both cases, we should probably align them :-))

The problem there, of course, is that those test runs won't understand the new Switch exit, so you will only be able to use this on develop, until you extend test/libyul/ControlFlowGraphTest.cpp and test/libyul/StackLayoutGeneratorTest.cpp to handle the new switch exit... but it may still be helpful to look at the existing graphs and stack layouts. And adjusting those tests may help in properly building the graphs and stack layouts by allowing to visualize them.

@jaa2 jaa2 dismissed a stale review via 6f0a560 June 16, 2022 16:06
}
}
else
m_assembly.appendJumpTo(m_blockLabels[_switch.target]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect something like

Suggested change
m_assembly.appendJumpTo(m_blockLabels[_switch.target]);
{
ScopeGuard stackRestore([storedStack = m_stack, this]() {
m_stack = move(storedStack);
m_assembly.setStackHeight(static_cast<int>(m_stack.size()));
});
m_assembly.appendInstruction(evmasm::Instruction::POP);
m_stack.pop_back();
createStackLayout(debugDataOf(_switch), m_stackLayout.blockInfos.at(_switch.target).entryLayout);
assertLayoutCompatibility(m_stack, m_stackLayout.blockInfos.at(_switch.target).entryLayout);
(*this)(*_switch.target);
}

here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 3d947dd

@jaa2 jaa2 dismissed a stale review via baf8beb June 16, 2022 19:49
Comment on lines 568 to 573
m_assembly.appendInstruction(evmasm::Instruction::POP);
m_stack.pop_back();
std::cout << "Adding case block " << caseValue << std::endl;
createStackLayout(debugDataOf(_switch), m_stackLayout.blockInfos.at(caseBlock).entryLayout);
assertLayoutCompatibility(m_stack, m_stackLayout.blockInfos.at(caseBlock).entryLayout);
(*this)(*caseBlock);
Copy link
Member

@ekpyron ekpyron Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the real problem, I think is here (and actually similarly for the default and target cases above).

This should actually just be nothing else than

Suggested change
m_assembly.appendInstruction(evmasm::Instruction::POP);
m_stack.pop_back();
std::cout << "Adding case block " << caseValue << std::endl;
createStackLayout(debugDataOf(_switch), m_stackLayout.blockInfos.at(caseBlock).entryLayout);
assertLayoutCompatibility(m_stack, m_stackLayout.blockInfos.at(caseBlock).entryLayout);
(*this)(*caseBlock);
(*this)(*caseBlock);

The main problem here is that only the (*this)(*caseBlock); will actually emit the label for the block caseBlock that you jumped to.
So the jump you generate earlier will just jump right past the POP you generated above and thus have a malformed stack.

The validation goes through since you're also changing m_stack, in a sense pretending like you did not jump with the layout you had when you actually jumped above, but one less slot. That's something to watch out for: the OptimizedCodeTransform really must keep m_stack valid, even across jumps, otherwise all the validation it does is worthless :-).

However, this also means that in the StackLayoutGenerator you will have to keep the switch expression in the entry layout of the case blocks, I'll add another comment about that there.

// The last block must have produced the switch expression at the stack top.
yulAssert(!exitLayout.empty(), "");
yulAssert(exitLayout.back() == _switch.switchExpr, "");
// The switch expression is consumed by the switch.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given my other comment, I think it's easier to generate layouts in which each switch case still has a copy of the switch expression in its entry layout - since that's how you'll actually end up jumping to them.
Not sure if this is the only place that needs adjustments due to that - but once you adjust the StackLayoutGeneratorTests you should be able to confirm if you get layouts that have the temporary switch expression slot as entires of the switch cases.

Comment on lines +508 to +513
for (auto const& [caseValue, caseBlock]: _switch.cases)
if (!m_blockLabels.count(caseBlock))
m_blockLabels[caseBlock] = m_assembly.newLabelId();
Copy link
Member

@ekpyron ekpyron Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, there is one different option, other than adding the switch expression back to the stack layouts of the switch cases:

Instead of creating the actual block labels here, you could just store "case dispatch" labels in a vector and jump to those, and then below, emit those labels, pop the switch expression and only then visit the case blocks. That way you could keep the case stack layouts free of the switch expression and we don't need to readjust things for the jump table implementation.

But either way is fine - we'll in any case first need a traditional implementation with switch exits that works reliably before we can move to jump tables.

Copy link
Member

@ekpyron ekpyron Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah, that may be nicer - conceptually the case blocks doesn't really want the switch expression in their entry layout, so the fact that a traditional switch dispatch needs them is more of an implementation detail of the switch dispatch.

@jaa2
Copy link
Contributor

jaa2 commented Jun 17, 2022

The switch block type is now in ControlFlowGraphTest and StackLayoutGeneratorTest. The new graphs seem correct to me, but I haven't yet committed updates to the test cases.

@ekpyron
Copy link
Member

ekpyron commented Jun 17, 2022

I've, out of curiosity, just built the latest PR version with additionally my last three comments addressed - and that version in fact passes all of our semantics tests, so that's great!

However, it has some adverse effects on the gas costs in quite a few cases, so it would be nice to look into some cases in more detail, compare the generated assembly and see if we can figure out where the increased cost is coming from and if we can do something about it. My guess would be that either the order of visitation of the blocks in the stack layout generator changed, resulting in less optimal layouts, or that it has something to do with the precise layout of code in the switch visit of the optimized code transform. But it's hard to tell without an in-depth look. It should be possible to get back similar gas results with only a few minor tweaks, though, I hope.

jaa2 and others added 27 commits August 7, 2022 10:15
Co-authored-by: Daniel Kirchner <daniel@ekpyron.org>
Co-authored-by: Daniel Kirchner <daniel@ekpyron.org>
Co-authored-by: Daniel Kirchner <daniel@ekpyron.org>
@ekpyron
Copy link
Member

ekpyron commented Aug 9, 2022

For reference the gas differences of the latest version:

(The perpetual pools regression is probably merely due to non-determinism)

ir-no-optimize

project bytecode_size deployment_gas method_gas
bleeps
brink
chainlink
colony 0%
elementfi
ens
euler
gnosis
gp2
perpetual-pools
pool-together
prb-math
trident
uniswap
yield_liquidator
zeppelin

ir-optimize-evm+yul

project bytecode_size deployment_gas method_gas
bleeps 0% 0% 0%
brink 0%
chainlink 0% +0% -0%
colony 0%
elementfi 0%
ens 0% +0% 0%
euler -0.01% ✅ -0.01% ✅ -0.09% ✅
gnosis 0% -0% -0%
gp2 0% +0% +0%
perpetual-pools 0% +0% +0.02% ❌
pool-together 0% -0% -0%
prb-math 0% +0% 0%
trident 0% -0% -0%
uniswap 0% +0% +0%
yield_liquidator 0% -0% 0%
zeppelin 0% -0% +0%

ir-optimize-evm-only

project bytecode_size deployment_gas method_gas
bleeps
brink
chainlink
colony 0%
elementfi
ens
euler
gnosis
gp2
perpetual-pools
pool-together
prb-math
trident
uniswap
yield_liquidator
zeppelin

legacy-no-optimize

project bytecode_size deployment_gas method_gas
bleeps
brink 0%
chainlink 0%
colony 0%
elementfi 0%
ens 0%
euler -0.02% ✅ -0.02% ✅ -0.19% ✅
gnosis 0% -0% +0%
gp2 0%
perpetual-pools 0% -0% -0.01% ✅
pool-together 0% -0% +0%
prb-math 0% 0% 0%
trident 0% 0% +0%
uniswap 0% +0% -0%
yield_liquidator 0% +0% 0%
zeppelin 0% +0% -0%

legacy-optimize-evm+yul

project bytecode_size deployment_gas method_gas
bleeps 0% 0% 0%
brink 0%
chainlink 0% +0% +0%
colony 0%
elementfi 0%
ens 0% 0% 0%
euler -0.03% ✅ -0.03% ✅ -0.19% ✅
gnosis 0% +0% -0%
gp2 0% +0% +0%
perpetual-pools 0% +0% -0.02% ✅
pool-together 0% +0% +0%
prb-math 0% 0% 0%
trident 0% -0% -0%
uniswap 0% +0% +0%
yield_liquidator 0% -0% -0%
zeppelin 0% -0% +0%

legacy-optimize-evm-only

project bytecode_size deployment_gas method_gas
bleeps
brink 0%
chainlink 0% +0% -0%
colony 0%
elementfi 0%
ens 0% +0% 0%
euler -0.02% ✅ -0.02% ✅ -0.13% ✅
gnosis 0% -0% -0%
gp2 0% 0% +0%
perpetual-pools 0% 0% +0%
pool-together 0% 0% +0%
prb-math 0% 0% 0%
trident 0% 0% -0%
uniswap 0% 0% +0%
yield_liquidator 0% +0% 0%
zeppelin 0% -0% +0%

!V = version mismatch
!B = no value in the "before" version
!A = no value in the "after" version
!T = one or both values were not numeric and could not be compared
-0 = very small negative value rounded to zero
+0 = very small positive value rounded to zero

So it already kicks in without us e.g. changing the external dispatch to something that allows jump tables.

I hope I'll have time for a closer look at the implementation soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants