-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use jump table for some switch statements #12978
base: develop
Are you sure you want to change the base?
Conversation
This comment was marked as spam.
This comment was marked as spam.
This is looking really good already! We have to be careful to find a good balance between this being generic enough and not too complicated, but I would suggest the following items (that could maybe go into future pull requests):
|
Item 3 is now completed. Any range of cases spanning no more than 16 values is now supported by subtracting the minimum case if necessary (underflows desired), and empty cases are allowed to be stored in the pushed value, which jump to the default case. |
I'm not sure if that was already your reason for the relative offsets, but if we use relative offsets and can ensure them to be at most 256, we can actually fit 32 cases in a single push constant instead of just 16. Also, @ekpyron suggested that as an alternative, we could also use a fully fixed-layout jump table instead, which would allow us to have no restriction on the size. The downside is that it needs another jump for each case and 4 additional bytes per case
|
The primary reason I included relative offsets was to eliminate a conditional jump for gas savings, but you're right, we could fit up to 32 cases in a single push, especially when the cases are sparse. The reason I haven't done it that way yet is for architecture reasons: the jump table code itself is currently built in EVMCodeTransform, and I wanted to make the fewest assumptions about the addresses where the cases begin. For instance, if we were to handle 32 cases and a case's address is >= 256 from the default case, the compilation would fail when deciding the jump addresses because the relative offset would not fit in a single byte. But the compilation could currently fail anyway with backward jumps because the offsets are relative. Those aren't possible with the unoptimized version since all non-default cases come after the default case, but maybe with the optimizer that will become an issue. Is there a way I can shift the assembly code generation of the jump table to be done after the case addresses are known? That way we could then decide if we could fit 32 cases and also whether we need to handle backward jumps. The code size would vary, so I'm not sure how I would handle that properly, maybe by taking up the maximum possible size and using some instruction for no-ops if the actual size is less.
Ha! That was my first idea. It is still limited by the range of the cases (we can't fit cases spanning a range of 20,000) and uses more gas (59 rather than 35), but if you have a larger range of cases it could be useful. |
libevmasm/Assembly.cpp
Outdated
case JumpTablePushTag: | ||
{ | ||
unsigned pushSize = (unsigned) i.jumpTableTags().size() * bytesPerJumpTableTag; | ||
if (pushSize == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen? Should we maybe rather assert that there's at least one tag already when constructing JumpTablePushTag items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to write this in a way that was as permissive as possible, so technically a jump table with no cases would work here. But since there is not much use for a zero-case jump table, I have added asserts when constructing a JumpTablePushTag and when doing the final assembly.
libevmasm/JumpdestRemover.cpp
Outdated
} | ||
case JumpTablePushTag: | ||
{ | ||
// TODO: Do we need to look at the sub here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jump table tags should always refer to the current sub-assembly, shouldn't they? We may want to assert this here, but don't need more than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was thinking, but I wasn't exactly sure of the sub-assembly architecture. Thank you for confirming that. I don't store the sub in the JumpTablePushTag anyway, so I just removed the comment.
_addChild(_switch.defaultCase); | ||
for (auto const& [caseValue, caseBlock]: _switch.cases) | ||
_addChild(caseBlock); | ||
_addChild(_switch.target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note it here again: it's only possible to directly jump to the target here, if there is no default case, so if there is a default case, _switch
should not contain a target
.
Alternatively, you could reuse the same field in Switch
for both and call it defaultCaseOrAfterSwitch
(or just defaultCase
with a comment that it will point to the block after the switch, in case there was no explicit default case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I stored the two separately to begin with is that in the most optimized jump table, the default case block must come before all the other case blocks, since offsets from the default case block are stored in the jump table. But I now realize that it's also possible to arrange the jump table the other way around, i.e. to require the default case block to come after all the other case blocks and instead of storing a positive offset, we store negative offsets in the jump table. This way, we can neatly keep the default case block and target
together in the way you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually hadn't considered that, but yeah :-)!
@@ -810,6 +921,46 @@ void StackLayoutGenerator::fillInJunk(CFG::BasicBlock const& _block) | |||
}, | |||
[&](CFG::BasicBlock::FunctionReturn const&) {}, | |||
[&](CFG::BasicBlock::Terminated const&) {}, | |||
[&](CFG::BasicBlock::Switch const& _switch) | |||
{ | |||
/*std::vector<CFG::BasicBlock*> exits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part will be less of a concern after #12132 (see #12132 (comment)). But it's not crucial to reproduce this for switches in a first implementation at all, so you're fine leaving this commented, resp. removing this and ignoring the fillInJunk
part apart from the child visits below.
@@ -540,6 +583,47 @@ void StackLayoutGenerator::stitchConditionalJumps(CFG::BasicBlock const& _block) | |||
}, | |||
[&](CFG::BasicBlock::FunctionReturn const&) {}, | |||
[&](CFG::BasicBlock::Terminated const&) { }, | |||
[&](CFG::BasicBlock::Switch const& _switch) | |||
{ | |||
/*Stack exitLayout = info.exitLayout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to keep this commented in, this part is crucial.
The commented code actually looks pretty much correct, apart from the fact that (as commented elsewhere), you don't jump to the target if there is a default case here.
For debugging the optimized evm code generation path, it may be good to know that you can visualize the control flow graphs and stack layouts by running our tests: You need to have something installed that can consume graphviz dot files on stdin - I usually use Then you can run (assuming a linux system, otherwise however you can set environment variables):
to get visualization of all control flow graph tests and
to get visualization of all stack layout tests. The problem there, of course, is that those test runs won't understand the new |
} | ||
} | ||
else | ||
m_assembly.appendJumpTo(m_blockLabels[_switch.target]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect something like
m_assembly.appendJumpTo(m_blockLabels[_switch.target]); | |
{ | |
ScopeGuard stackRestore([storedStack = m_stack, this]() { | |
m_stack = move(storedStack); | |
m_assembly.setStackHeight(static_cast<int>(m_stack.size())); | |
}); | |
m_assembly.appendInstruction(evmasm::Instruction::POP); | |
m_stack.pop_back(); | |
createStackLayout(debugDataOf(_switch), m_stackLayout.blockInfos.at(_switch.target).entryLayout); | |
assertLayoutCompatibility(m_stack, m_stackLayout.blockInfos.at(_switch.target).entryLayout); | |
(*this)(*_switch.target); | |
} |
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 3d947dd
m_assembly.appendInstruction(evmasm::Instruction::POP); | ||
m_stack.pop_back(); | ||
std::cout << "Adding case block " << caseValue << std::endl; | ||
createStackLayout(debugDataOf(_switch), m_stackLayout.blockInfos.at(caseBlock).entryLayout); | ||
assertLayoutCompatibility(m_stack, m_stackLayout.blockInfos.at(caseBlock).entryLayout); | ||
(*this)(*caseBlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the real problem, I think is here (and actually similarly for the default and target cases above).
This should actually just be nothing else than
m_assembly.appendInstruction(evmasm::Instruction::POP); | |
m_stack.pop_back(); | |
std::cout << "Adding case block " << caseValue << std::endl; | |
createStackLayout(debugDataOf(_switch), m_stackLayout.blockInfos.at(caseBlock).entryLayout); | |
assertLayoutCompatibility(m_stack, m_stackLayout.blockInfos.at(caseBlock).entryLayout); | |
(*this)(*caseBlock); | |
(*this)(*caseBlock); |
The main problem here is that only the (*this)(*caseBlock);
will actually emit the label for the block caseBlock
that you jumped to.
So the jump you generate earlier will just jump right past the POP
you generated above and thus have a malformed stack.
The validation goes through since you're also changing m_stack
, in a sense pretending like you did not jump with the layout you had when you actually jumped above, but one less slot. That's something to watch out for: the OptimizedCodeTransform
really must keep m_stack
valid, even across jumps, otherwise all the validation it does is worthless :-).
However, this also means that in the StackLayoutGenerator
you will have to keep the switch expression in the entry layout of the case blocks, I'll add another comment about that there.
// The last block must have produced the switch expression at the stack top. | ||
yulAssert(!exitLayout.empty(), ""); | ||
yulAssert(exitLayout.back() == _switch.switchExpr, ""); | ||
// The switch expression is consumed by the switch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given my other comment, I think it's easier to generate layouts in which each switch case still has a copy of the switch expression in its entry layout - since that's how you'll actually end up jumping to them.
Not sure if this is the only place that needs adjustments due to that - but once you adjust the StackLayoutGeneratorTest
s you should be able to confirm if you get layouts that have the temporary switch expression slot as entires of the switch cases.
for (auto const& [caseValue, caseBlock]: _switch.cases) | ||
if (!m_blockLabels.count(caseBlock)) | ||
m_blockLabels[caseBlock] = m_assembly.newLabelId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, there is one different option, other than adding the switch expression back to the stack layouts of the switch cases:
Instead of creating the actual block labels here, you could just store "case dispatch" labels in a vector and jump to those, and then below, emit those labels, pop the switch expression and only then visit the case blocks. That way you could keep the case stack layouts free of the switch expression and we don't need to readjust things for the jump table implementation.
But either way is fine - we'll in any case first need a traditional implementation with switch exits that works reliably before we can move to jump tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yeah, that may be nicer - conceptually the case blocks doesn't really want the switch expression in their entry layout, so the fact that a traditional switch dispatch needs them is more of an implementation detail of the switch dispatch.
The switch block type is now in ControlFlowGraphTest and StackLayoutGeneratorTest. The new graphs seem correct to me, but I haven't yet committed updates to the test cases. |
I've, out of curiosity, just built the latest PR version with additionally my last three comments addressed - and that version in fact passes all of our semantics tests, so that's great! However, it has some adverse effects on the gas costs in quite a few cases, so it would be nice to look into some cases in more detail, compare the generated assembly and see if we can figure out where the increased cost is coming from and if we can do something about it. My guess would be that either the order of visitation of the blocks in the stack layout generator changed, resulting in less optimal layouts, or that it has something to do with the precise layout of code in the switch visit of the optimized code transform. But it's hard to tell without an in-depth look. It should be possible to get back similar gas results with only a few minor tweaks, though, I hope. |
Co-authored-by: Daniel Kirchner <daniel@ekpyron.org>
Co-authored-by: Daniel Kirchner <daniel@ekpyron.org>
Co-authored-by: Daniel Kirchner <daniel@ekpyron.org>
For reference the gas differences of the latest version: (The perpetual pools regression is probably merely due to non-determinism)
|
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | |||
chainlink | |||
colony | 0% |
||
elementfi | |||
ens | |||
euler | |||
gnosis | |||
gp2 | |||
perpetual-pools | |||
pool-together | |||
prb-math | |||
trident | |||
uniswap | |||
yield_liquidator | |||
zeppelin |
ir-optimize-evm+yul
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | 0% |
0% |
0% |
brink | 0% |
||
chainlink | 0% |
+0% |
-0% |
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
+0% |
0% |
euler | -0.01% ✅ |
-0.01% ✅ |
-0.09% ✅ |
gnosis | 0% |
-0% |
-0% |
gp2 | 0% |
+0% |
+0% |
perpetual-pools | 0% |
+0% |
+0.02% ❌ |
pool-together | 0% |
-0% |
-0% |
prb-math | 0% |
+0% |
0% |
trident | 0% |
-0% |
-0% |
uniswap | 0% |
+0% |
+0% |
yield_liquidator | 0% |
-0% |
0% |
zeppelin | 0% |
-0% |
+0% |
ir-optimize-evm-only
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | |||
chainlink | |||
colony | 0% |
||
elementfi | |||
ens | |||
euler | |||
gnosis | |||
gp2 | |||
perpetual-pools | |||
pool-together | |||
prb-math | |||
trident | |||
uniswap | |||
yield_liquidator | |||
zeppelin |
legacy-no-optimize
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | 0% |
||
chainlink | 0% |
||
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
||
euler | -0.02% ✅ |
-0.02% ✅ |
-0.19% ✅ |
gnosis | 0% |
-0% |
+0% |
gp2 | 0% |
||
perpetual-pools | 0% |
-0% |
-0.01% ✅ |
pool-together | 0% |
-0% |
+0% |
prb-math | 0% |
0% |
0% |
trident | 0% |
0% |
+0% |
uniswap | 0% |
+0% |
-0% |
yield_liquidator | 0% |
+0% |
0% |
zeppelin | 0% |
+0% |
-0% |
legacy-optimize-evm+yul
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | 0% |
0% |
0% |
brink | 0% |
||
chainlink | 0% |
+0% |
+0% |
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
0% |
0% |
euler | -0.03% ✅ |
-0.03% ✅ |
-0.19% ✅ |
gnosis | 0% |
+0% |
-0% |
gp2 | 0% |
+0% |
+0% |
perpetual-pools | 0% |
+0% |
-0.02% ✅ |
pool-together | 0% |
+0% |
+0% |
prb-math | 0% |
0% |
0% |
trident | 0% |
-0% |
-0% |
uniswap | 0% |
+0% |
+0% |
yield_liquidator | 0% |
-0% |
-0% |
zeppelin | 0% |
-0% |
+0% |
legacy-optimize-evm-only
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | 0% |
||
chainlink | 0% |
+0% |
-0% |
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
+0% |
0% |
euler | -0.02% ✅ |
-0.02% ✅ |
-0.13% ✅ |
gnosis | 0% |
-0% |
-0% |
gp2 | 0% |
0% |
+0% |
perpetual-pools | 0% |
0% |
+0% |
pool-together | 0% |
0% |
+0% |
prb-math | 0% |
0% |
0% |
trident | 0% |
0% |
-0% |
uniswap | 0% |
0% |
+0% |
yield_liquidator | 0% |
+0% |
0% |
zeppelin | 0% |
-0% |
+0% |
!V
= version mismatch
!B
= no value in the "before" version
!A
= no value in the "after" version
!T
= one or both values were not numeric and could not be compared
-0
= very small negative value rounded to zero
+0
= very small positive value rounded to zero
So it already kicks in without us e.g. changing the external dispatch to something that allows jump tables.
I hope I'll have time for a closer look at the implementation soon.
No description provided.