Skip to content

Commit

Permalink
Addressed latest comments from hzongaro
Browse files Browse the repository at this point in the history
  • Loading branch information
gita-omr committed Jan 24, 2024
1 parent fa62645 commit 86a2859
Showing 1 changed file with 24 additions and 24 deletions.
48 changes: 24 additions & 24 deletions doc/compiler/osr/OSR.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ OMR provides infrastructure for involuntary OSR. Downstream projects can impleme

# Inducible and Uniducible OSR Yield Points

During voluntary OSR, the JIT can induce OSR after practically every OSR yield point and those points are called **inducible OSR yield points**. However, there are some OSR yield points after which the JIT cannot induce OSR. Those are referred to as **uninducible OSR yield points**. A typical example of an uninducible OSR yield point is when there is a thunk archetype present on the inline stack.
During voluntary OSR, the JIT can induce OSR after practically every OSR yield point and those points are called **inducible OSR yield points**. However, there are some OSR yield points after which the JIT cannot induce OSR. Those are referred to as **uninducible OSR yield points**. A typical example of an uninducible OSR yield point is when there is [a thunk archetype](https://github.com/eclipse-openj9/openj9/blob/0318ff92766bbbdf36c10a202b08b45ef1b6522f/doc/compiler/README.md?plain=1#L126) present on the inline stack.


# Pre- and Post-Execution OSR
Expand All @@ -86,7 +86,7 @@ Sometimes, a slot is shared by one or more JIT symbols. For example, it happens

As an example, let's consider compiled method A that calls method B.

If OSR is induced inside method A, values of all A's locals and operands at the OSR induction point need to be conveyed to VM, along with the slot sharing info at that point.
If OSR is induced inside method A, values of all A's locals and operands at the OSR induction point need to be conveyed to the VM, along with the slot sharing info at that point.

If OSR happens within B and B is not inlined into A, two scenarios are possible:

Expand All @@ -100,26 +100,26 @@ If OSR happens within B and it's inined into A, we need to know:
- values of all local and stack operand slots for A
- values of all local and stack operand slots for B
- slot sharing info for B at the OSR induction point in B
- slot sharing info for A at the point where A calls B. If there is another instance of B inlined into A, we would need yet another slot sharing info for A.
- slot sharing info for A at the point where A calls B. If there is another instance of B inlined into A, we would need yet another set of slot sharing info for A.

Therefore, we can say that the full set of all the necessary info is specific to the offset of the OSR induction point within the compiled body of the method being compiled. This point will determine (1) the frames that need to be restored, (2) how the slots need to be populated for each of those frames.


# OSR Points

OSR points are points in the program where the information above is collected, therefore, allowing the transition to happen in those points. OSR points are originally created just before or after OSR yield points during GenIL. While the OSR yield points can be optimized away later (such as asynccheck, monenter etc.) OSR transition can still happen at those points.
OSR points are points in the program where the information above is collected, therefore, allowing the transition to happen in those points. OSR points are originally created just before or after OSR yield points during GenIL. While the OSR yield points can be optimized away later (such as asynccheck, monenter, etc.) OSR transition can still happen at those points.

Notice that OSR points are not explicitly present in the IL or the generated code. They are just bookkeeping points that are used for making decisions during compile time as well as generating metadata to be used at runtime. Some OSR points are used purely for analysis and some will eventually become OSR induction points (in the case of voluntary OSR).


# Liveness Analysis

In order to create slot sharing info, liveness analysis is performed. Liveness info is collected at **OSR points**. In the example above, the OSR analysis points will be:
In order to create slot sharing info, liveness analysis is performed. Liveness info is collected at OSR points. In the example above, the OSR analysis points will be:

- the OSR induction point in B
- all points just before B is inlined into A.

Notice that B's parameters should be excluded from the live pending push temporaries since they are already popped from stack at the time the OSR within B takes place.
Notice that, if B is inlined into A, B's parameters should be excluded from the live pending push temporaries since they are already popped from the stack at the time the OSR within B takes place.

In addition to creating slot sharing info, liveness analysis can also determine if some symbol is not used on a certain path and therefore does not need to be copied into its slot.

Expand All @@ -128,51 +128,51 @@ In addition to creating slot sharing info, liveness analysis can also determine

# Data Structures and Helpers

There are two types of data structures: ones that only exist at compile time and metadata that is used at runtime.
There are two types of data structures: ones that only exist at compile time and metadata that is used at run time.

## Compile time

**TR_OSRMethodData**: slot sharing info for each OSR point within one inlined site index. OSR point is identified by its bytecode index.

**TR_OSRPoint**: assosiates **TR_OSRMethodData** with a bytecode index.
**TR_OSRPoint**: associates **TR_OSRMethodData** with a bytecode index.

**Q**: seems like a circular dependence between TR_OSRPoint and TR_OSRMethodData: the latter describes mutiple points but then is associated with one point?

Array of **TR_OSRMethodData**: has an entry for each inlined site index in the method being compiled. Eventually, it is used to populate **Instr2SharedSlotMetaData**.

## Runtime
## Run time

**OSRBuffer**: the buffer that is populated by compiled code with locals and operands for all active frames. It is used by VM to reconstruct the frames. **OSRBuffer** is created by the VM. Since the VM is aware of the number of auto slots, parameter slots,
**OSRBuffer**: the buffer that is populated by compiled code with locals and operands for all active frames. It is used by the VM to reconstruct the frames. **OSRBuffer** is created by the VM. Since the VM is aware of the number of auto slots, parameter slots,
and the maximum number of pending push temporary slots, it can always ensure the buffer is large enough to hold all autos,
parameters and pending push temporaries for all interpreter frames the VM needs to recreate at a particular OSR point.

**OSRScratchBuffer**: the buffer is populated by compiled code with symbols that share slots. It is used by **prepareForOSRHelper** together with **Instr2SharedSlotMetaData** to copy the right symbols into **OSRBuffer**.
**OSRScratchBuffer**: the buffer is populated by compiled code with symbols that share slots. It is used by **prepareForOSR** together with **Instr2SharedSlotMetaData** to copy the right symbols into **OSRBuffer**.

**Instr2SharedSlotMetaData**: each entry contains an instruction offset of an OSR induction point within the compiled method. It is followed by an array of tuples. Each tuple contains inined site index and indicates from which offset in **OSRScracthBuffer** to which offset in **OSRBuffer** the symbol needs to be copied by the corresponding **prepareForOsrHelper**. Therefore, given an offset of an OSR induction point, this metadata describes which JIT symbols need to be copied into **OSRBuffer** for each frame.
**Instr2SharedSlotMetaData**: each entry contains an instruction offset of an OSR induction point within the compiled method. It is followed by an array of tuples. Each tuple contains inined site index and indicates from which offset in **OSRScratchBuffer** to which offset in **OSRBuffer** the symbol needs to be copied by the corresponding **prepareForOsr**. Therefore, given an offset of an OSR induction point, this metadata describes which JIT symbols need to be copied into **OSRBuffer** for each frame.

**prepareForOsrHelper**: a helper call that is inserted in the IL with the purpose of keeping all necessary symbol live at a transition point. It also "logically" copies JIT symbols into **OSRBuffer**. Although, during trees lowering, symbols are explicitly copied in the IL, except for the ones that share slots. Then, the helper is only used for copying symbols that require slot sharing info. **prepareForOsrHelper** takes its inlined site index as a parameter.
**prepareForOsr**: a helper call that is inserted in the IL with the purpose of keeping all necessary symbol live at a transition point. It also "logically" copies JIT symbols into **OSRBuffer**. Although, during trees lowering, symbols are explicitly copied in the IL, except for the ones that share slots. Then, the helper is only used for copying symbols that require slot sharing info. **prepareForOsr** takes its inlined site index as a parameter.


# IL Generation

During IL generation, we examine each bytecode to see if it is an OSR yield point. If the current bytecode is an OSR yield point,
we first store all operands that are currently on stack into pending push temporaries. An **OSR catch block** for the current inlined site index is created if it does not exist yet. An exception edge is then added between the current block and the OSR catch block.
we first store all operands that are currently on the stack into pending push temporaries. An **OSR catch block** for the current inlined site index is created if it does not exist yet. An exception edge is then added between the current block and the OSR catch block.

Each OSR catch block has a single successor that is called an **OSR code block**. A **prepareForOSRHelper** call node is added into that OSR code block. The node takes loads of all autos, parameters, and pending push temporaries used in the current method (but not methods inlined into it) as its arguments so that they will remain live at OSR points. It also takes its inlined site index as an argument.
Each OSR catch block has a single successor that is called an **OSR code block**. A **prepareForOSR** call node is added into that OSR code block. The node takes loads of all autos, parameters, and pending push temporaries used in the current method (but not methods inlined into it) as its arguments so that they will remain live at OSR points. It also takes its inlined site index as an argument.
Creating the exception blocks and edges prevents splitting of the block where the inspected bytecode resides,
hence allowing local optimizations to work on the same block with the additional restrictions imposed by these OSR points as they can now cause OSR exceptions.

In our example, for an OSR yield point in B, which is inlined into A, we create two OSR catch blocks and their corresponding OSR code blocks:
In our example, for an OSR yield point in B that is inlined into A, we create two OSR catch blocks and their corresponding OSR code blocks:

1) one OSR catch block and following OSR code block which stores all A's autos, parameters, and pending push temporaries and then returns control back to the VM;
1) one OSR catch block and a following OSR code block which stores all A's autos, parameters, and pending push temporaries and then returns control back to the VM;

2) one OSR catch block and following OSR code block which stores all B's autos, parameters, and pending push temporaries and then goes to the OSR code block for A.
2) one OSR catch block and a following OSR code block which stores all B's autos, parameters, and pending push temporaries and then goes to the OSR code block for A.


The reason for having both OSR catch block(intially empty) and the following OSR code block is that when we want to call A's **prepareForOSRHelper** after we called B's we don't want to have a regular edge to A's catch block in addition to the exception edge since having both regular and exception edge into the same block is not supported in the IL.
The reason for having both OSR catch block (intially empty) and the following OSR code block is that when we want to call A's **prepareForOSR** after we called B's we don't want to have a regular edge to A's catch block in addition to the exception edge since having both regular and exception edge into the same block is not supported in the IL.


During lowering, in an OSR code block, we prepend a sequence of stores of autos, parameters, and pending push temporaries that don't share any slots directly into **OSRBuffer**. The ones that are shared are stored into **OSRScratchBuffer**. The call to **prepareForOSRHelper** only remains if slot sharing is present at least on one path leading to it.
During lowering, in an OSR code block, we prepend a sequence of stores of autos, parameters, and pending push temporaries that don't share any slots directly into **OSRBuffer**. The ones that are shared are stored into **OSRScratchBuffer**. The call to **prepareForOSR** only remains if slot sharing is present at least on one path leading to it.


Here is a CFG example when method B is inlined into method A twice:
Expand Down Expand Up @@ -204,7 +204,7 @@ Here is a CFG example when method B is inlined into method A twice:
| |
--------------------- ---------------------
| OSR code block |___>| OSR code block |___> call to VM
|prepareForOSRHelper| |prepareForOSRHelper|
|prepareForOSR | |prepareForOSR |
| iconst 0 | | iconst -1 |
--------------------- ---------------------
^
Expand All @@ -214,7 +214,7 @@ Here is a CFG example when method B is inlined into method A twice:
| |
--------------------- |
| OSR code block |____________|
|prepareForOSRHelper|
|prepareForOSR |
| iconst 1 |
---------------------
Expand All @@ -226,9 +226,9 @@ Here is a CFG example when method B is inlined into method A twice:
In the case of involuntary OSR, any OSR yield point can trigger an OSR.
If OSR is requested, it is treated as an exception and control flow is transferred to the corresponding OSR catch block.
The corresponding OSR code block copies all the necessary parameters, autos, and pending push temporaries into **OSRBuffer**. Symbols that share slots are copied into **OSRScratchBuffer**.
If shared slots are present, **prepareForOSRHelper** decides which symbols have to be copied to which slot based on **Instr2SharedSlotMetaData**, the inlined site index passed to it, and the address of the OSR induction point that is known to the VM at the point of exception.
If shared slots are present, **prepareForOSR** decides which symbols have to be copied to which slot based on **Instr2SharedSlotMetaData**, the inlined site index passed to it, and the address of the OSR induction point that is known to the VM at the point of exception.

At the end of each OSR code block, we either return control back to the VM by calling the runtime OSR helper if the current method is not inlined, or we go to the OSR code block of the caller of the inlined method. This way, **prepareForOSRHelper** is called for each frame that needs to be restored and finds all the necessery info inside **Instr2SharedSlotMetaData**.
At the end of each OSR code block, we either return control back to the VM by calling the runtime OSR helper if the current method is not inlined, or we go to the OSR code block of the caller of the inlined method. This way, **prepareForOSR** is called for each frame that needs to be restored and finds all the necessery info inside **Instr2SharedSlotMetaData**.

Generally, execution is transitioned from optimized code to interpreted code once we are in OSR mode.
However, there is an exception to this rule. If both the VM and the JIT agree that we can resume executing optimized code after reconstructing interpreter stack frames,
Expand Down

0 comments on commit 86a2859

Please sign in to comment.