Skip to content

[EVM] Add support for spills and reloads #828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 24, 2025

Conversation

vladimirradosavljevic
Copy link
Contributor

@vladimirradosavljevic vladimirradosavljevic commented May 29, 2025

In order to add spills and reloads support, following has implemented:

  1. The stackification process is now iterative, with a configurable
    maximum number of iterations. Spill candidates are identified
    and spilled in subsequent iterations until stack-too-deep errors
    are resolved or the iteration limit is reached. When they are resolved
    only then we are emitting instructions.
  2. The heuristic for selecting which registers to spill and reload
    is based on LLVM's weight calculation. When an unreachable slot
    is encountered, all registers from that point to the end are
    considered, and the register with the lowest weight is chosen.
    This approach has yielded the best results in our benchmarks so far.
  3. To indicate that a register needs to be spilled, an IsSpill data member
    has been added to RegisterSlot, along with corresponding getter and
    setter methods.
  4. For all spills and reloads, new PUSH_FRAME instruction is introduced which
    accepts frame index. This is needed because we need to allocate stack size
    for each function after stackification is done.
  5. To minimize the number of stack slots, StackSlotColoring is invoked
    after stackification. This pass relies on LLVM analyses—also used
    during register allocation—to assign stack slots with the live
    intervals of their corresponding registers.
  6. EVMFinalizeStackFrames is introduced to calculate stack sizes for each function
    and to replace frame indices with concrete stack offsets.

3 new options are added to control this support:

  1. -evm-max-spill-iterations=uint32_t
    This option controls how many iterations we want to perform during stackification.
    If 0, spills and reloads support is disabled.
  2. -evm-stack-region-size=uint64_t
    This option sets the size of the allocated stack region for the whole module.
    In case allocated stack region is less than the actual total size, error
    will be produced.
  3. -evm-stack-region-offset=uint64_t
    This options sets the offset where the stack region starts.

@vladimirradosavljevic
Copy link
Contributor Author

Even though this is a WIP patch, comments are welcome.

Copy link

github-actions bot commented May 29, 2025

Results for: evm ir-llvm EVMInterpreter
╔═╡ Size (-%) ╞═══════════════════════╡ All E +M3B3 ╞═╗
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Cycles (-%) ╞═════════════════════╡ All E +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Ergs (-%) ╞═══════════════════════╡ All E +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠══╡ Gas (-%) ╞═══════════════════════╡ All E +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╚═════════════════════════════════════════════════════╝

╔═╡ Size (-%) ╞═══════════════════════╡ All E +MzB3 ╞═╗
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Cycles (-%) ╞═════════════════════╡ All E +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Ergs (-%) ╞═══════════════════════╡ All E +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠══╡ Gas (-%) ╞═══════════════════════╡ All E +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╚═════════════════════════════════════════════════════╝

╔═╡ Size (-%) ╞═══════════════════════╡ All Y +M3B3 ╞═╗
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Cycles (-%) ╞═════════════════════╡ All Y +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Ergs (-%) ╞═══════════════════════╡ All Y +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠══╡ Gas (-%) ╞═══════════════════════╡ All Y +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╚═════════════════════════════════════════════════════╝

╔═╡ Size (-%) ╞═══════════════════════╡ All Y +MzB3 ╞═╗
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Cycles (-%) ╞═════════════════════╡ All Y +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Ergs (-%) ╞═══════════════════════╡ All Y +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠══╡ Gas (-%) ╞═══════════════════════╡ All Y +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╚═════════════════════════════════════════════════════╝

╔═╡ Size (-%) ╞═════════════════╡ Real life E +M3B3 ╞═╗
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Cycles (-%) ╞═══════════════╡ Real life E +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Ergs (-%) ╞═════════════════╡ Real life E +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠══╡ Gas (-%) ╞═════════════════╡ Real life E +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╚═════════════════════════════════════════════════════╝

╔═╡ Size (-%) ╞═════════════════╡ Real life E +MzB3 ╞═╗
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Cycles (-%) ╞═══════════════╡ Real life E +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Ergs (-%) ╞═════════════════╡ Real life E +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠══╡ Gas (-%) ╞═════════════════╡ Real life E +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╚═════════════════════════════════════════════════════╝

╔═╡ Size (-%) ╞═════════════════╡ Real life Y +M3B3 ╞═╗
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Cycles (-%) ╞═══════════════╡ Real life Y +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Ergs (-%) ╞═════════════════╡ Real life Y +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠══╡ Gas (-%) ╞═════════════════╡ Real life Y +M3B3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╚═════════════════════════════════════════════════════╝

╔═╡ Size (-%) ╞═════════════════╡ Real life Y +MzB3 ╞═╗
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Cycles (-%) ╞═══════════════╡ Real life Y +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠═╡ Ergs (-%) ╞═════════════════╡ Real life Y +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╠══╡ Gas (-%) ╞═════════════════╡ Real life Y +MzB3 ╞═╣
║ Best                                          0.000 ║
║ Worst                                         0.000 ║
║ Total                                         0.000 ║
╚═════════════════════════════════════════════════════╝

Copy link

github-actions bot commented May 29, 2025

Target Mode Toolchain Environment Link
eravm E+M3B3_0.4 ir-llvm zk_evm Results
eravm E+M3B3_0.5 ir-llvm zk_evm Results
eravm E+M3B3_0.6 ir-llvm zk_evm Results
eravm E+M3B3_0.7 ir-llvm zk_evm Results
eravm E+M3B3_0.8 ir-llvm zk_evm Results
eravm E+MzB3_0.4 ir-llvm zk_evm Results
eravm E+MzB3_0.5 ir-llvm zk_evm Results
eravm E+MzB3_0.6 ir-llvm zk_evm Results
eravm E+MzB3_0.7 ir-llvm zk_evm Results
eravm E+MzB3_0.8 ir-llvm zk_evm Results
eravm Y+M3B3 ir-llvm zk_evm Results
eravm Y+MzB3 ir-llvm zk_evm Results
evm E+M3B3 ir-llvm EVMInterpreter Results
evm E+M3B3 ir-llvm REVM Results
evm E+MzB3 ir-llvm EVMInterpreter Results
evm E+MzB3 ir-llvm REVM Results
evm Y+M3B3 ir-llvm EVMInterpreter Results
evm Y+M3B3 ir-llvm REVM Results
evm Y+MzB3 ir-llvm EVMInterpreter Results
evm Y+MzB3 ir-llvm REVM Results
evm E+_0.8 solc EVMInterpreter Results
evm E+_0.8 solc REVM Results
evm Y+ solc EVMInterpreter Results
evm Y+ solc REVM Results

@abinavpp
Copy link
Contributor

Only reviewed EVMStackifyCodeEmitter::CodeEmitter::emitReload and emitSpill and how EVM::PUSH_FRAME is lowered in EVMFinalizeStackFrames::finalizeStackFrames. All good!

Question: I just forced the round up to multiple of 32 for the spill area size in solc. Saw the MF.getFrameInfo().CreateSpillStackObject(32, Align(32)), but just double checking: Is the offset calculated in the PUSH_FRAME lowering (CurrentStackRegionOffset + MFI.getObjectOffset(FrameIndexOp.getIndex())) always a multiple of 32? Just concerned if the mload/mstore near the end of spill area might go beyond the word boundary.

@vladimirradosavljevic
Copy link
Contributor Author

vladimirradosavljevic commented May 30, 2025

Only reviewed EVMStackifyCodeEmitter::CodeEmitter::emitReload and emitSpill and how EVM::PUSH_FRAME is lowered in EVMFinalizeStackFrames::finalizeStackFrames. All good!

Question: I just forced the round up to multiple of 32 for the spill area size in solc. Saw the MF.getFrameInfo().CreateSpillStackObject(32, Align(32)), but just double checking: Is the offset calculated in the PUSH_FRAME lowering (CurrentStackRegionOffset + MFI.getObjectOffset(FrameIndexOp.getIndex())) always a multiple of 32? Just concerned if the mload/mstore near the end of spill area might go beyond the word boundary.

Thanks for the comment!
As long as the StackRegionOffset is the multiple of 32, mload/mstore will do the word load/store. Good point, I will add an assert to always check that StackRegionOffset is the multiple of 32.
Please note that MF.getFrameInfo().CreateSpillStackObject(32, Align(32)) will create slots of 32bytes.

@vladimirradosavljevic vladimirradosavljevic changed the title [WIP][EVM] Add support for spills and reloads [EVM] Add support for spills and reloads Jun 4, 2025
@vladimirradosavljevic vladimirradosavljevic force-pushed the evm_spills_reloads branch 2 times, most recently from 8be3667 to 61dc562 Compare June 5, 2025 13:10
Instead of printing only the instruction name, display
the entire instruction along with its operands. This
provides better insight into the stackification algorithm
by making inputs and outputs more visible.

Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
… YAML file

Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
In order to add spills and reloads support, following has implemented:

1. The stackification process is now iterative, with a configurable
   maximum number of iterations. Spill candidates are identified
   and spilled in subsequent iterations until stack-too-deep errors
   are resolved or the iteration limit is reached. When they are resolved
   only then we are emitting instructions.
2. The heuristic for selecting which registers to spill and reload
   is based on LLVM's weight calculation. When an unreachable slot
   is encountered, all registers from that point to the end are
   considered, and the register with the lowest weight is chosen.
   This approach has yielded the best results in our benchmarks so far.
3. Spills and reloads are implemented around remat slots. This reduced
   code that was needed to support this (other solution was to introduce
   new StackSlot kind).
4. For all spills and reloads, new PUSH_FRAME instruction is introduced which
   accepts frame index. This is needed because we need to allocate stack size
   for each function after stackification is done.
5. To minimize the number of stack slots, StackSlotColoring is invoked
   after stackification. This pass relies on LLVM analyses—also used
   during register allocation—to assign stack slots with the live
   intervals of their corresponding registers.
6. EVMFinalizeStackFrames is introduced to calculate stack sizes for each function
   and to replace frame indices with concrete stack offsets.

3 new options are added to control this support:
1. -evm-max-spill-iterations=uint32_t
   This option controls how many iterations we want to perform during stackification.
   If 0, spills and reloads support is disabled.
2. -evm-stack-region-size=uint64_t
   This option sets the size of the allocated stack region for the whole module.
   In case allocated stack region is less than the actual total size, error
   will be produced.
3. -evm-stack-region-offset=uint64_t
   This options sets the offset where the stack region starts.

Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
Find all recursive functions in a module and mark
them with 'evm-recursive' attribute. This attribute
is used later during stackification to issue an error
if we find stack too deep errors, since we can't use
spills in that case. Reason why we can't is that for
spills we are using memory and not the real stack.

Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
In case of recursive functions, we can't use spills to fix
stack too deep issues, as we are using memory to spill and
not real stack. If we run into stack too deep issues for
recursive functions, we will force compress stack across
the whole function to try to fix the issues.

Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
Copy link
Collaborator

@akiramenai akiramenai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use norecurse LLVM attribute? In addition to carrying semantic similar to !evm-recursive, it seems it enables some optimizations.

@vladimirradosavljevic
Copy link
Contributor Author

Can't we use norecurse LLVM attribute? In addition to carrying semantic similar to !evm-recursive, it seems it enables some optimizations.

I was investigating this, and I found 2 issues with norecurse:

  1. If function contains intrinsic that doesn't have NoCallback attribute, it won't mark it with NoRecurse. This is not a problem, as we can add this attribute to our intrinsics, but
  2. In -O0 mode, this attribute is not added, and that could be a problem, as we might want to do spills and reloads in that case.

@akiramenai
Copy link
Collaborator

How many local changes are needed to make norecurse inference work in O0?
Is there a chance that NVPTX, SPIR-V would want it too and support upstreaming?

@vladimirradosavljevic
Copy link
Contributor Author

How many local changes are needed to make norecurse inference work in O0? Is there a chance that NVPTX, SPIR-V would want it too and support upstreaming?

I'm not sure, but if we want to go this path, I would suggest to create a ticket for it, as trying to push this to community, will be slow + I'm not sure of historical reasons why this is disabled for O0.
The only place I see this is used for targets is for AMDGPU (AMDGPUResourceUsageAnalysis), but this is just usage analysis and it is not something that changes functionality of a module.

Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
@akiramenai
Copy link
Collaborator

Recursion is a correctness issue for most of GPUs, not only AMD. They diagnose it in the very beginning of the pipeline, in clang and in the very end (e.g. in PTX). Still vendors like ARM codegen directly from LLVM and keep their backends proprietary. So they might support inferring norecurse in O0. Maybe someone has some local changes already.
I think not maintaining a pass is better than maintaining it, even if slow. So we can go with a ticket, yet it might be better to not to postpone it and at least ask in the community what is the status.
Also, we can keep a local change if it's small. It might be a preferable option to the pass.

Copy link
Collaborator

@akiramenai akiramenai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@PavelKopyl PavelKopyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank!

Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
@vladimirradosavljevic vladimirradosavljevic merged commit 28eb73d into main Jun 24, 2025
30 of 31 checks passed
@vladimirradosavljevic vladimirradosavljevic deleted the evm_spills_reloads branch June 24, 2025 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants