Use jump table for some switch statements #12978

chriseth · 2022-05-02T07:48:08Z

No description provided.

chriseth · 2022-05-05T08:20:48Z

This is looking really good already! We have to be careful to find a good balance between this being generic enough and not too complicated, but I would suggest the following items (that could maybe go into future pull requests):

implement this in optimized evm transform
use gas measurements and the "runs" parameter to decide whether or not to do the optimization
allow cases that are not exactly "enum-like" - for example it might not be such a big deal if some of the values are missing - we just use the same tag as the default case
think about how we could use this in the function dispatch - see Constant-complexity dispatcher idea #12650

jaa2 · 2022-05-05T16:46:01Z

Item 3 is now completed. Any range of cases spanning no more than 16 values is now supported by subtracting the minimum case if necessary (underflows desired), and empty cases are allowed to be stored in the pushed value, which jump to the default case.

chriseth · 2022-05-12T09:13:02Z

I'm not sure if that was already your reason for the relative offsets, but if we use relative offsets and can ensure them to be at most 256, we can actually fit 32 cases in a single push constant instead of just 16.

Also, @ekpyron suggested that as an alternative, we could also use a fully fixed-layout jump table instead, which would allow us to have no restriction on the size. The downside is that it needs another jump for each case and 4 additional bytes per case

let x :=<switch value>
if ge(x, <max value>) { /* something else */ }
jump(add(<start>, mul(x, <byte size of case>)))
// for each case:
jumpdest
pushtag <actual tag>
jump

jaa2 · 2022-05-12T16:03:47Z

The primary reason I included relative offsets was to eliminate a conditional jump for gas savings, but you're right, we could fit up to 32 cases in a single push, especially when the cases are sparse. The reason I haven't done it that way yet is for architecture reasons: the jump table code itself is currently built in EVMCodeTransform, and I wanted to make the fewest assumptions about the addresses where the cases begin. For instance, if we were to handle 32 cases and a case's address is >= 256 from the default case, the compilation would fail when deciding the jump addresses because the relative offset would not fit in a single byte.

But the compilation could currently fail anyway with backward jumps because the offsets are relative. Those aren't possible with the unoptimized version since all non-default cases come after the default case, but maybe with the optimizer that will become an issue.

Is there a way I can shift the assembly code generation of the jump table to be done after the case addresses are known? That way we could then decide if we could fit 32 cases and also whether we need to handle backward jumps. The code size would vary, so I'm not sure how I would handle that properly, maybe by taking up the maximum possible size and using some instruction for no-ops if the actual size is less.

fully fixed-layout jump table

Ha! That was my first idea. It is still limited by the range of the cases (we can't fit cases spanning a range of 20,000) and uses more gas (59 rather than 35), but if you have a larger range of cases it could be useful.

ekpyron · 2022-05-20T12:00:07Z

libevmasm/Assembly.cpp

+		case JumpTablePushTag:
+		{
+			unsigned pushSize = (unsigned) i.jumpTableTags().size() * bytesPerJumpTableTag;
+			if (pushSize == 0)


Can this happen? Should we maybe rather assert that there's at least one tag already when constructing JumpTablePushTag items?

I tried to write this in a way that was as permissive as possible, so technically a jump table with no cases would work here. But since there is not much use for a zero-case jump table, I have added asserts when constructing a JumpTablePushTag and when doing the final assembly.

ekpyron · 2022-05-20T12:01:57Z

libevmasm/JumpdestRemover.cpp

+		}
+		case JumpTablePushTag:
+		{
+			// TODO: Do we need to look at the sub here?


Jump table tags should always refer to the current sub-assembly, shouldn't they? We may want to assert this here, but don't need more than that.

That's what I was thinking, but I wasn't exactly sure of the sub-assembly architecture. Thank you for confirming that. I don't store the sub in the JumpTablePushTag anyway, so I just removed the comment.

libyul/backends/evm/ControlFlowGraphBuilder.cpp

ekpyron · 2022-06-16T10:58:13Z

libyul/backends/evm/StackLayoutGenerator.cpp

+					_addChild(_switch.defaultCase);
+				for (auto const& [caseValue, caseBlock]: _switch.cases)
+					_addChild(caseBlock);
+				_addChild(_switch.target);


Just to note it here again: it's only possible to directly jump to the target here, if there is no default case, so if there is a default case, _switch should not contain a target.
Alternatively, you could reuse the same field in Switch for both and call it defaultCaseOrAfterSwitch (or just defaultCase with a comment that it will point to the block after the switch, in case there was no explicit default case).

The reason I stored the two separately to begin with is that in the most optimized jump table, the default case block must come before all the other case blocks, since offsets from the default case block are stored in the jump table. But I now realize that it's also possible to arrange the jump table the other way around, i.e. to require the default case block to come after all the other case blocks and instead of storing a positive offset, we store negative offsets in the jump table. This way, we can neatly keep the default case block and target together in the way you suggested.

I actually hadn't considered that, but yeah :-)!

ekpyron · 2022-06-16T11:02:47Z

libyul/backends/evm/StackLayoutGenerator.cpp

@@ -810,6 +921,46 @@ void StackLayoutGenerator::fillInJunk(CFG::BasicBlock const& _block)
 			},
 			[&](CFG::BasicBlock::FunctionReturn const&) {},
 			[&](CFG::BasicBlock::Terminated const&) {},
+			[&](CFG::BasicBlock::Switch const& _switch)
+			{
+				/*std::vector<CFG::BasicBlock*> exits;


This part will be less of a concern after #12132 (see #12132 (comment)). But it's not crucial to reproduce this for switches in a first implementation at all, so you're fine leaving this commented, resp. removing this and ignoring the fillInJunk part apart from the child visits below.

ekpyron · 2022-06-16T11:06:32Z

libyul/backends/evm/StackLayoutGenerator.cpp

@@ -540,6 +583,47 @@ void StackLayoutGenerator::stitchConditionalJumps(CFG::BasicBlock const& _block)
 			},
 			[&](CFG::BasicBlock::FunctionReturn const&)	{},
 			[&](CFG::BasicBlock::Terminated const&) { },
+			[&](CFG::BasicBlock::Switch const& _switch)
+			{
+				/*Stack exitLayout = info.exitLayout;


You will need to keep this commented in, this part is crucial.

The commented code actually looks pretty much correct, apart from the fact that (as commented elsewhere), you don't jump to the target if there is a default case here.

ekpyron · 2022-06-16T11:15:38Z

For debugging the optimized evm code generation path, it may be good to know that you can visualize the control flow graphs and stack layouts by running our tests:

You need to have something installed that can consume graphviz dot files on stdin - I usually use xdot for that.

Then you can run (assuming a linux system, otherwise however you can set environment variables):

export ISOLTEST_DISPLAY_GRAPHS_ON_SUCCESS_COMMAND="xdot -"
export ISOLTEST_DISPLAY_GRAPHS_ON_FAILURE_COMMAND="xdot -"
build/test/tools/isoltest -t yulControlFlowGraph*

to get visualization of all control flow graph tests and

export ISOLTEST_DISPLAY_GRAPHS_SUCCESS="xdot -"
export ISOLTEST_DISPLAY_GRAPHS_FAILURE="xdot -"
build/test/tools/isoltest -t yulStackLayout*

to get visualization of all stack layout tests.
(It should actually be the same environment variables in both cases, we should probably align them :-))

The problem there, of course, is that those test runs won't understand the new Switch exit, so you will only be able to use this on develop, until you extend test/libyul/ControlFlowGraphTest.cpp and test/libyul/StackLayoutGeneratorTest.cpp to handle the new switch exit... but it may still be helpful to look at the existing graphs and stack layouts. And adjusting those tests may help in properly building the graphs and stack layouts by allowing to visualize them.

ekpyron · 2022-06-16T19:26:06Z

libyul/backends/evm/OptimizedEVMCodeTransform.cpp

+				}
+			}
+			else
+				m_assembly.appendJumpTo(m_blockLabels[_switch.target]);


I'd expect something like

Suggested change

m_assembly.appendJumpTo(m_blockLabels[_switch.target]);

{

ScopeGuard stackRestore([storedStack = m_stack, this]() {

m_stack = move(storedStack);

m_assembly.setStackHeight(static_cast<int>(m_stack.size()));

});

m_assembly.appendInstruction(evmasm::Instruction::POP);

m_stack.pop_back();

createStackLayout(debugDataOf(_switch), m_stackLayout.blockInfos.at(_switch.target).entryLayout);

assertLayoutCompatibility(m_stack, m_stackLayout.blockInfos.at(_switch.target).entryLayout);

(*this)(*_switch.target);

}

here

Added in 3d947dd

libyul/backends/evm/ControlFlowGraphBuilder.cpp

libyul/backends/evm/OptimizedEVMCodeTransform.cpp

ekpyron · 2022-06-16T19:59:57Z

libyul/backends/evm/OptimizedEVMCodeTransform.cpp

+					m_assembly.appendInstruction(evmasm::Instruction::POP);
+					m_stack.pop_back();
+					std::cout << "Adding case block " << caseValue << std::endl;
+					createStackLayout(debugDataOf(_switch), m_stackLayout.blockInfos.at(caseBlock).entryLayout);
+					assertLayoutCompatibility(m_stack, m_stackLayout.blockInfos.at(caseBlock).entryLayout);
+					(*this)(*caseBlock);


Actually the real problem, I think is here (and actually similarly for the default and target cases above).

This should actually just be nothing else than

Suggested change

m_assembly.appendInstruction(evmasm::Instruction::POP);

m_stack.pop_back();

std::cout << "Adding case block " << caseValue << std::endl;

createStackLayout(debugDataOf(_switch), m_stackLayout.blockInfos.at(caseBlock).entryLayout);

assertLayoutCompatibility(m_stack, m_stackLayout.blockInfos.at(caseBlock).entryLayout);

(*this)(*caseBlock);

(*this)(*caseBlock);

The main problem here is that only the (*this)(*caseBlock); will actually emit the label for the block caseBlock that you jumped to.
So the jump you generate earlier will just jump right past the POP you generated above and thus have a malformed stack.

The validation goes through since you're also changing m_stack, in a sense pretending like you did not jump with the layout you had when you actually jumped above, but one less slot. That's something to watch out for: the OptimizedCodeTransform really must keep m_stack valid, even across jumps, otherwise all the validation it does is worthless :-).

However, this also means that in the StackLayoutGenerator you will have to keep the switch expression in the entry layout of the case blocks, I'll add another comment about that there.

ekpyron · 2022-06-16T20:01:52Z

libyul/backends/evm/StackLayoutGenerator.cpp

+				// The last block must have produced the switch expression at the stack top.
+				yulAssert(!exitLayout.empty(), "");
+				yulAssert(exitLayout.back() == _switch.switchExpr, "");
+				// The switch expression is consumed by the switch.


Given my other comment, I think it's easier to generate layouts in which each switch case still has a copy of the switch expression in its entry layout - since that's how you'll actually end up jumping to them.
Not sure if this is the only place that needs adjustments due to that - but once you adjust the StackLayoutGeneratorTests you should be able to confirm if you get layouts that have the temporary switch expression slot as entires of the switch cases.

ekpyron · 2022-06-16T20:52:47Z

libyul/backends/evm/OptimizedEVMCodeTransform.cpp

+			for (auto const& [caseValue, caseBlock]: _switch.cases)
+				if (!m_blockLabels.count(caseBlock))
+					m_blockLabels[caseBlock] = m_assembly.newLabelId();


Thinking about it, there is one different option, other than adding the switch expression back to the stack layouts of the switch cases:

Instead of creating the actual block labels here, you could just store "case dispatch" labels in a vector and jump to those, and then below, emit those labels, pop the switch expression and only then visit the case blocks. That way you could keep the case stack layouts free of the switch expression and we don't need to readjust things for the jump table implementation.

But either way is fine - we'll in any case first need a traditional implementation with switch exits that works reliably before we can move to jump tables.

But yeah, that may be nicer - conceptually the case blocks doesn't really want the switch expression in their entry layout, so the fact that a traditional switch dispatch needs them is more of an implementation detail of the switch dispatch.

jaa2 · 2022-06-17T08:06:38Z

The switch block type is now in ControlFlowGraphTest and StackLayoutGeneratorTest. The new graphs seem correct to me, but I haven't yet committed updates to the test cases.

libyul/backends/evm/ControlFlowGraphBuilder.cpp

libyul/backends/evm/StackLayoutGenerator.cpp

libyul/backends/evm/OptimizedEVMCodeTransform.cpp

ekpyron · 2022-06-17T12:05:52Z

I've, out of curiosity, just built the latest PR version with additionally my last three comments addressed - and that version in fact passes all of our semantics tests, so that's great!

However, it has some adverse effects on the gas costs in quite a few cases, so it would be nice to look into some cases in more detail, compare the generated assembly and see if we can figure out where the increased cost is coming from and if we can do something about it. My guess would be that either the order of visitation of the blocks in the stack layout generator changed, resulting in less optimal layouts, or that it has something to do with the precise layout of code in the switch visit of the optimized code transform. But it's hard to tell without an in-depth look. It should be possible to get back similar gas results with only a few minor tweaks, though, I hope.

Co-authored-by: Daniel Kirchner <daniel@ekpyron.org>

ekpyron · 2022-08-09T18:52:03Z

For reference the gas differences of the latest version:

(The perpetual pools regression is probably merely due to non-determinism)

`ir-no-optimize`

project	bytecode_size	deployment_gas	method_gas
bleeps
brink
chainlink
colony	`0%`
elementfi
ens
euler
gnosis
gp2
perpetual-pools
pool-together
prb-math
trident
uniswap
yield_liquidator
zeppelin

`ir-optimize-evm+yul`

project	bytecode_size	deployment_gas	method_gas
bleeps	`0%`	`0%`	`0%`
brink	`0%`
chainlink	`0%`	`+0%`	`-0%`
colony	`0%`
elementfi	`0%`
ens	`0%`	`+0%`	`0%`
euler	`-0.01% ✅`	`-0.01% ✅`	`-0.09% ✅`
gnosis	`0%`	`-0%`	`-0%`
gp2	`0%`	`+0%`	`+0%`
perpetual-pools	`0%`	`+0%`	`+0.02% ❌`
pool-together	`0%`	`-0%`	`-0%`
prb-math	`0%`	`+0%`	`0%`
trident	`0%`	`-0%`	`-0%`
uniswap	`0%`	`+0%`	`+0%`
yield_liquidator	`0%`	`-0%`	`0%`
zeppelin	`0%`	`-0%`	`+0%`

`ir-optimize-evm-only`

project	bytecode_size	deployment_gas	method_gas
bleeps
brink
chainlink
colony	`0%`
elementfi
ens
euler
gnosis
gp2
perpetual-pools
pool-together
prb-math
trident
uniswap
yield_liquidator
zeppelin

`legacy-no-optimize`

project	bytecode_size	deployment_gas	method_gas
bleeps
brink	`0%`
chainlink	`0%`
colony	`0%`
elementfi	`0%`
ens	`0%`
euler	`-0.02% ✅`	`-0.02% ✅`	`-0.19% ✅`
gnosis	`0%`	`-0%`	`+0%`
gp2	`0%`
perpetual-pools	`0%`	`-0%`	`-0.01% ✅`
pool-together	`0%`	`-0%`	`+0%`
prb-math	`0%`	`0%`	`0%`
trident	`0%`	`0%`	`+0%`
uniswap	`0%`	`+0%`	`-0%`
yield_liquidator	`0%`	`+0%`	`0%`
zeppelin	`0%`	`+0%`	`-0%`

`legacy-optimize-evm+yul`

project	bytecode_size	deployment_gas	method_gas
bleeps	`0%`	`0%`	`0%`
brink	`0%`
chainlink	`0%`	`+0%`	`+0%`
colony	`0%`
elementfi	`0%`
ens	`0%`	`0%`	`0%`
euler	`-0.03% ✅`	`-0.03% ✅`	`-0.19% ✅`
gnosis	`0%`	`+0%`	`-0%`
gp2	`0%`	`+0%`	`+0%`
perpetual-pools	`0%`	`+0%`	`-0.02% ✅`
pool-together	`0%`	`+0%`	`+0%`
prb-math	`0%`	`0%`	`0%`
trident	`0%`	`-0%`	`-0%`
uniswap	`0%`	`+0%`	`+0%`
yield_liquidator	`0%`	`-0%`	`-0%`
zeppelin	`0%`	`-0%`	`+0%`

`legacy-optimize-evm-only`

project	bytecode_size	deployment_gas	method_gas
bleeps
brink	`0%`
chainlink	`0%`	`+0%`	`-0%`
colony	`0%`
elementfi	`0%`
ens	`0%`	`+0%`	`0%`
euler	`-0.02% ✅`	`-0.02% ✅`	`-0.13% ✅`
gnosis	`0%`	`-0%`	`-0%`
gp2	`0%`	`0%`	`+0%`
perpetual-pools	`0%`	`0%`	`+0%`
pool-together	`0%`	`0%`	`+0%`
prb-math	`0%`	`0%`	`0%`
trident	`0%`	`0%`	`-0%`
uniswap	`0%`	`0%`	`+0%`
yield_liquidator	`0%`	`+0%`	`0%`
zeppelin	`0%`	`-0%`	`+0%`

!V = version mismatch
!B = no value in the "before" version
!A = no value in the "after" version
!T = one or both values were not numeric and could not be compared
-0 = very small negative value rounded to zero
+0 = very small positive value rounded to zero

So it already kicks in without us e.g. changing the external dispatch to something that allows jump tables.

I hope I'll have time for a closer look at the implementation soon.

chriseth requested a review from ekpyron May 2, 2022 07:48

This comment was marked as spam.

Sign in to view

cameel added the external contribution ⭐ label May 12, 2022

ekpyron reviewed May 20, 2022

View reviewed changes

jaa2 dismissed a stale review via b25e8aa June 14, 2022 17:48

ekpyron mentioned this pull request Jun 16, 2022

Better Debugging Data (in particular for via-IR codegen) #13157

Closed