ethereum · gcolvin · Sep 24, 2021
@@ -14,11 +14,11 @@ requires: 3540, 3670
 
 This proposal introduces four opcodes to support simple subroutines and relative jumps: `JUMPSUB`, `RETURNSUB`, `RJUMP` and `RJUMPI`.
 
-This change supports substantial reductions in the complexity and the gas costs of calling and optimizing simple subroutines – from %33 to as much as 52% savings in gas.
+These provide a complete static control-flow facility that supports substantial reductions in the complexity and the gas costs of calling and optimizing simple subroutines – from %33 to as much as 52% savings in gas.
 
 ## Motivation
 
-The EVM does not provide subroutines as a primitive.  Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it.  These conventions are more costly and complex than necessary.  This cost and complexity is borne by the humans and programs writing, reading, and analyzing EVM code,
+The EVM does not provide subroutines as a primitive.  Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it.  These conventions create unecessary cost and complexity that is borne by the humans and programs writing, reading, and analyzing EVM code,
 
 Facilities to directly support subroutines are provided by all but one of the real and virtual machines programmed by the lead author, including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP 11 and VAX, Motorola 68000, a few generations of Intel silicon, Sun SPARC, UCSD p-Machine, Sun JVM, Wasm, and the sole exception -- the EVM.  In whatever form, these operations provide for
 * capturing the current context of execution,
@@ -34,107 +34,9 @@ The concept goes back to [Turing, 1946](http://www.alanturing.net/turing_archive
 
 We propose to follow Turing's simple concept in our subroutine design, as specified below.
 
-_Note that this specification is entirely semantic.  It constrains only data usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed._
-
-## Specification
-
-We introduce one more stack into the EVM in addition to the existing `data stack`, which we call the `return stack`. The `return stack` is limited to `1024` items. This stack supports three new instructions for subroutines.
-
-### Instructions
-
-#### `JUMPSUB (0x5e) location`
-
-> Transfers control to a subroutine.
->
-> 1. Decode the `location` from the immediate data.  The data is encoded as three bytes, MSB-first.
-> 2. If the opcode at `location` is not a `JUMPDEST` _`abort`_.
-> 3. If the `return stack` already has `1024` items _`abort`_.
-> 4. Push the current `PC + 1` to the `return stack`.
-> 5. Set `PC` to `location`.
->
->  The cost is _low_.
->  
-> * _pops one item off the `data stack`_
-> * _pushes one item on the `return stack`_
-
-#### `RETURNSUB (0x5f)`
-
-> Returns control to the caller of a subroutine.
-> 
-> 1. If the `return stack` is empty _`abort`_.
-> 2. Pop `PC` off the `return stack`.
->
-> The cost is _verylow_.
->
-> * _pops one item off the `return stack`_
-
-To provide a complete set of control structures, and to take full advantage of the performance benefits of simple subroutines we also provide two  static, relative jump functions that take their arguments as immediate data rather then off the stack.
-
-#### `RJUMP (0x5c) offset`
-
-> Transfers control to the address `PC + offset`, where offset is a three-byte, MSB first, twos-complement integer.
->
-> 1. Decode the `offset` from the immediate data.  The data is encoded as three bytes, MSB first, twos-complement.
-> 2. If the opcode at `location` is not a `JUMDEST` then _`abort`_.
-> 5. Set `PC` to `location`.
->
->  The cost is _low_.
-
-#### `RJUMPI (0x5d) offset`
-
-> Conditionally transfers control to the address `PC + offset`, where offset is a three-byte, MSB first, twos-complement integer.
-> 1. Decode the `offset` from the immediate data.  The data is encoded as three bytes, MSB first, twos-complement.
-> 2. Pop the `condition` from the stack.
-> 3. If the condition is true then continue
-> 4. If the opcode at `PC + offset` is not a `JUMDEST` _`abort`_.
->  Set `PC` to `PC + offset`.
->
->  The cost is _mid_.
-
-_Notes:_
-* _If a resulting `PC` to be executed is beyond the last instruction then the opcode is implicitly a `STOP`, which is not an error._
-* _Values popped off the `return stack` do not need to be validated, since they are alterable only by `JUMPSUB` and `RETURNSUB`._
-* _The description above lays out the semantics of this feature in terms of a `return stack`.  But the actual state of the `return stack` is not observable by EVM code or consensus-critical to the protocol.  (For example, a node implementer may code `JUMPSUB` to unobservably push `PC` on the `return stack` rather than `PC + 1`, which is allowed so long as `RETURNSUB` observably returns control to the `PC + 1` location.)_
-* _The `return stack` is the functional equivalent of Turing's "delay line"._
-
-`JUMP` and `JUMPI` are assigned _mid_ and _high_ gas fees -- they require operations on 256-bit stack items and checking for valid destinations.   None of these operations require checking, and only `RJUMPI` requires 256-bit arithmetic.  So the _low_ cost of `JUMPSUB` versus is justified by needing only about six Go operations to push the return address on the return stack and decode the immediate two byte destination to the `PC` and the _verylow_ cost of `RETURNSUB` is justified by needing only about three Go operations to pop the return stack into the `PC`.  Similarly, the _low_ cost of `RJUMP` is justified by needing even less work than `JUMPSUB`, and the cost `RJUMPI` is `mid` because of the extra work to test the conditional.   Benchmarking will be needed to tell if the costs are well-balanced. 
-
-### Validity
-
-Attempts to create contracts that can be proven to be invalid will fail.
-
-Invalid contracts will have one or more
-*  invalid instructions,
-*  invalid jump destinations,
-*  stack underflows, or
-*  stack overflows without recursion.
-
-Because of  dynamic JUMP and JUMPI instructions  the rules below give necessary but not sufficient conditions for validity.  So long as we have unrestricted dynamic jumps we cannot prove contracts to be valid,  only invalid.  (See [EIP-3779](./eip-3779.md) for further discussion of validation and a means of restricting dynamic jumps.)
-
-### Validation Rules
-
-> This section extends the contact creation validation rules (as defined in EIP-3540 and EIP-3670.)
-1. Every `RJUMP` and `RJUMPI` addresses a valid `JUMPDEST`.
-2. The stack depth is
-   * always positive and
-   * the same on every path through an opcode.
-3. The `stack pointer` is always positive and at most 1024.
-
-The Yellow Paper has the `stack pointer` (`SP`) pointing just past the top item on the `data stack`.   We define the `stack base` (`BP`)as the element that the `SP` addressed at the entry to the current _basic block_, or `0` on program entry, and the `stack depth` as the number of stack elements between the current `SP` and the current `BP`.
-
-Taken together, these rules allow for code to be (in-)validated by traversing the control-flow graph, following each edge only once.  
-
-## Rationale
-
-We modeled this design on Moore's 1970 [Forth virtual machine](http://www.ultratechnology.com/4th_1970.pdf). It is a simple two-stack design – the data stack is supplemented with a return stack to support jumping to and returning from subroutines, as specified above, and as conceptualized by Turing.  The return address (Turing's "note") is pushed onto the return stack (Turing's "delay line") when calling, and the return address is popped into the `PC` when returning.
-
-The alternative design is to push the return address and the destination address on the data stack before jumping to the subroutine, and to later jump back to the return address on the stack in order to return.  This is the current approach.  It could be streamlined to some extent by having JUMPSUB push the return address for RETURNSUB to pop.
-
-We prefer the separate return stack because it maintains a clear separation between data and flow of control.  This ensures that the return address cannot be overwritten or mislaid.  It also reduces costs by using fewer data stack slots and moving less data.
-
 ### Gas Cost Analysis
 
-We show here how these opcodes can be used to reduce the gas costs of both ordinary subroutine calls and low-level optimizations.  The savings reported here will of course be less relevant to programs that use a few large subroutines rather than being a factored than into smaller ones.   The choice of gas costs for the new opcodes above does not make a large difference in this analysis, as much of the improvement is due to PUSH and SWAP operations that are no longer needed.  Even if `JUMPSUB` cost the same as `JUMP` – 8 gas rather than 5 - a simple subroutine call would still be 48% less costly versus 52%.
+We show here how these opcodes can be used to reduce the gas costs of both ordinary subroutine calls and low-level optimizations.  The savings reported here will of course be less relevant to programs that use a few large subroutines rather than being a factored than into smaller ones.   The choices of gas costs for the new opcodes below do not make a large difference in this analysis, as much of the improvement is due to PUSH and SWAP operations that are no longer needed.  Even if `JUMPSUB` cost the same as `JUMP` – 8 gas rather than 5 - a simple subroutine call would still be 48% less costly versus 52%.
 
 #### **Simple Subroutine Call**
 
@@ -272,6 +174,105 @@ SQUARE:
 ```
 Total 31 gas, compared to 24 gas for the return stack version.
 
+_Note that this specification is entirely semantic.  It constrains only data usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed._
+
+## Specification
+
+We introduce one more stack into the EVM in addition to the existing `data stack`, which we call the `return stack`. The `return stack` is limited to `1024` items. This stack supports three new instructions for subroutines.
+
+### Instructions
+
+#### `JUMPSUB (0x5e) location`
+
+> Transfers control to a subroutine.
+>
+> 1. Decode the `location` from the immediate data.  The data is encoded as three bytes, MSB-first.
+> 2. If the opcode at `location` is not a `JUMPDEST` _`abort`_.
+> 3. If the `return stack` already has `1024` items _`abort`_.
+> 4. Push the current `PC + 1` to the `return stack`.
+> 5. Set `PC` to `location`.
+>
+>  The cost is _low_.
+>  
+> * _pops one item off the `data stack`_
+> * _pushes one item on the `return stack`_
+
+#### `RETURNSUB (0x5f)`
+
+> Returns control to the caller of a subroutine.
+> 
+> 1. If the `return stack` is empty _`abort`_.
+> 2. Pop `PC` off the `return stack`.
+>
+> The cost is _verylow_.
+>
+> * _pops one item off the `return stack`_
+
+To provide a complete set of control structures, and to take full advantage of the performance benefits of simple subroutines we also provide two  static, relative jump functions that take their arguments as immediate data rather then off the stack.
+
+#### `RJUMP (0x5c) offset`
+
+> Transfers control to the address `PC + offset`, where offset is a three-byte, MSB first, twos-complement integer.
+>
+> 1. Decode the `offset` from the immediate data.  The data is encoded as three bytes, MSB first, twos-complement.
+> 2. If the opcode at `location` is not a `JUMDEST` then _`abort`_.
+> 5. Set `PC` to `location`.
+>
+>  The cost is _low_.
+
+#### `RJUMPI (0x5d) offset`
+
+> Conditionally transfers control to the address `PC + offset`, where offset is a three-byte, MSB first, twos-complement integer.
+> 1. Decode the `offset` from the immediate data.  The data is encoded as three bytes, MSB first, twos-complement.
+> 2. Pop the `condition` from the stack.
+> 3. If the condition is true then continue
+> 4. If the opcode at `PC + offset` is not a `JUMDEST` _`abort`_.
+>  Set `PC` to `PC + offset`.
+>
+>  The cost is _mid_.
+
+_Notes:_
+* _If a resulting `PC` to be executed is beyond the last instruction then the opcode is implicitly a `STOP`, which is not an error._
+* _Values popped off the `return stack` do not need to be validated, since they are alterable only by `JUMPSUB` and `RETURNSUB`._
+* _The description above lays out the semantics of this feature in terms of a `return stack`.  But the actual state of the `return stack` is not observable by EVM code or consensus-critical to the protocol.  (For example, a node implementer may code `JUMPSUB` to unobservably push `PC` on the `return stack` rather than `PC + 1`, which is allowed so long as `RETURNSUB` observably returns control to the `PC + 1` location.)_
+* _The `return stack` is the functional equivalent of Turing's "delay line"._
+
+`JUMP` and `JUMPI` are assigned _mid_ and _high_ gas fees -- they require operations on 256-bit stack items and checking for valid destinations.   None of these operations require checking, and only `RJUMPI` requires 256-bit arithmetic.  So the _low_ cost of `JUMPSUB` versus is justified by needing only about six Go operations to push the return address on the return stack and decode the immediate two byte destination to the `PC` and the _verylow_ cost of `RETURNSUB` is justified by needing only about three Go operations to pop the return stack into the `PC`.  Similarly, the _low_ cost of `RJUMP` is justified by needing even less work than `JUMPSUB`, and the cost `RJUMPI` is `mid` because of the extra work to test the conditional.   Benchmarking will be needed to tell if the costs are well-balanced. 
+
+### Validity
+
+Attempts to create contracts that can be proven to be invalid will fail.
+
+Invalid contracts will have one or more
+*  invalid instructions,
+*  invalid jump destinations,
+*  stack underflows, or
+*  stack overflows without recursion.
+
+Because of  dynamic JUMP and JUMPI instructions  the rules below give necessary but not sufficient conditions for validity.  So long as we have unrestricted dynamic jumps we cannot prove contracts to be valid,  only invalid.  (See [EIP-3779](./eip-3779.md) for further discussion of validation and a means of restricting dynamic jumps.)
+
+### Validation Rules
+
+> This section extends the contact creation validation rules (as defined in EIP-3540 and EIP-3670.)
+1. Every `RJUMP` and `RJUMPI` addresses a valid `JUMPDEST`.
+2. The stack depth is
+   * always positive and
+   * the same on every path through an opcode.
+3. The `stack pointer` is always positive and at most 1024.
+
+The Yellow Paper has the `stack pointer` (`SP`) pointing just past the top item on the `data stack`.   We define the `stack base` (`BP`)as the element that the `SP` addressed at the entry to the current _basic block_, or `0` on program entry, and the `stack depth` as the number of stack elements between the current `SP` and the current `BP`.
+
+Taken together, these rules allow for code to be (in-)validated by traversing the control-flow graph, following each edge only once.  
+
+## Rationale
+
+We modeled this design on Moore's 1970 [Forth virtual machine](http://www.ultratechnology.com/4th_1970.pdf). It is a simple two-stack design – the data stack is supplemented with a return stack to support jumping to and returning from subroutines, as specified above, and as conceptualized by Turing.  The return address (Turing's "note") is pushed onto the return stack (Turing's "delay line") when calling, and the return address is popped into the `PC` when returning.
+
+The alternative design is to push the return address and the destination address on the data stack before jumping to the subroutine, and to later jump back to the return address on the stack in order to return.  This is the current approach.  It could be streamlined to some extent by having JUMPSUB push the return address for RETURNSUB to pop.
+
+We prefer the separate return stack because it maintains a clear separation between data and flow of control.  This ensures that the return address cannot be overwritten or mislaid.  It also reduces costs by using fewer data stack slots and moving less data.
+
+
 ## Backwards Compatibility
 
 These changes do not affect the semantics of existing EVM code.  These changes are compatible with the restricted forms of `JUMP` and `JUMPI` specified by [EIP-3779](./eip-3779.md)  -- contracts following all of the rules given there and here will be valid.