-
Notifications
You must be signed in to change notification settings - Fork 23
[EVM] Add support for spills and reloads #828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
63a8cff
to
849336f
Compare
Even though this is a WIP patch, comments are welcome. |
|
|
Only reviewed EVMStackifyCodeEmitter::CodeEmitter::emitReload and emitSpill and how EVM::PUSH_FRAME is lowered in EVMFinalizeStackFrames::finalizeStackFrames. All good! Question: I just forced the round up to multiple of 32 for the spill area size in solc. Saw the |
Thanks for the comment! |
849336f
to
f461d4e
Compare
8be3667
to
61dc562
Compare
61dc562
to
e988d90
Compare
Instead of printing only the instruction name, display the entire instruction along with its operands. This provides better insight into the stackification algorithm by making inputs and outputs more visible. Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
… YAML file Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
In order to add spills and reloads support, following has implemented: 1. The stackification process is now iterative, with a configurable maximum number of iterations. Spill candidates are identified and spilled in subsequent iterations until stack-too-deep errors are resolved or the iteration limit is reached. When they are resolved only then we are emitting instructions. 2. The heuristic for selecting which registers to spill and reload is based on LLVM's weight calculation. When an unreachable slot is encountered, all registers from that point to the end are considered, and the register with the lowest weight is chosen. This approach has yielded the best results in our benchmarks so far. 3. Spills and reloads are implemented around remat slots. This reduced code that was needed to support this (other solution was to introduce new StackSlot kind). 4. For all spills and reloads, new PUSH_FRAME instruction is introduced which accepts frame index. This is needed because we need to allocate stack size for each function after stackification is done. 5. To minimize the number of stack slots, StackSlotColoring is invoked after stackification. This pass relies on LLVM analyses—also used during register allocation—to assign stack slots with the live intervals of their corresponding registers. 6. EVMFinalizeStackFrames is introduced to calculate stack sizes for each function and to replace frame indices with concrete stack offsets. 3 new options are added to control this support: 1. -evm-max-spill-iterations=uint32_t This option controls how many iterations we want to perform during stackification. If 0, spills and reloads support is disabled. 2. -evm-stack-region-size=uint64_t This option sets the size of the allocated stack region for the whole module. In case allocated stack region is less than the actual total size, error will be produced. 3. -evm-stack-region-offset=uint64_t This options sets the offset where the stack region starts. Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
Find all recursive functions in a module and mark them with 'evm-recursive' attribute. This attribute is used later during stackification to issue an error if we find stack too deep errors, since we can't use spills in that case. Reason why we can't is that for spills we are using memory and not the real stack. Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
In case of recursive functions, we can't use spills to fix stack too deep issues, as we are using memory to spill and not real stack. If we run into stack too deep issues for recursive functions, we will force compress stack across the whole function to try to fix the issues. Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use norecurse
LLVM attribute? In addition to carrying semantic similar to !evm-recursive
, it seems it enables some optimizations.
I was investigating this, and I found 2 issues with
|
How many local changes are needed to make |
I'm not sure, but if we want to go this path, I would suggest to create a ticket for it, as trying to push this to community, will be slow + I'm not sure of historical reasons why this is disabled for O0. |
Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
e988d90
to
c3bf6ef
Compare
Recursion is a correctness issue for most of GPUs, not only AMD. They diagnose it in the very beginning of the pipeline, in clang and in the very end (e.g. in PTX). Still vendors like ARM codegen directly from LLVM and keep their backends proprietary. So they might support inferring norecurse in O0. Maybe someone has some local changes already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank!
Signed-off-by: Vladimir Radosavljevic <vr@matterlabs.dev>
In order to add spills and reloads support, following has implemented:
maximum number of iterations. Spill candidates are identified
and spilled in subsequent iterations until stack-too-deep errors
are resolved or the iteration limit is reached. When they are resolved
only then we are emitting instructions.
is based on LLVM's weight calculation. When an unreachable slot
is encountered, all registers from that point to the end are
considered, and the register with the lowest weight is chosen.
This approach has yielded the best results in our benchmarks so far.
has been added to RegisterSlot, along with corresponding getter and
setter methods.
accepts frame index. This is needed because we need to allocate stack size
for each function after stackification is done.
after stackification. This pass relies on LLVM analyses—also used
during register allocation—to assign stack slots with the live
intervals of their corresponding registers.
and to replace frame indices with concrete stack offsets.
3 new options are added to control this support:
This option controls how many iterations we want to perform during stackification.
If 0, spills and reloads support is disabled.
This option sets the size of the allocated stack region for the whole module.
In case allocated stack region is less than the actual total size, error
will be produced.
This options sets the offset where the stack region starts.