Skip to content

Conversation

@kunalspathak
Copy link
Contributor

@kunalspathak kunalspathak commented Sep 4, 2021

Problem:
We assign a register to a variable that has zero initialization, but in the same block if there is another variable that needs the same register (eg. ABI requirement of FixedRegister), we would go ahead and assigning it although the earlier zeroInit variable was live into it. This happens because in the same block, we will only spill the variable at the end of that block and until then, the register stays assigned to the zero initialized variable.

We have enabled EH Write thru only for single-def variable. When we think that there could be a zero initialization, it technically means multiple definition and we bailout the EH write thru optimization. However, as seen in this comment, the condition to check if we will zero-init a variable or not is different during liveness vs. what we use in LSRA. Because of that, we ended up enregistering an EH var which got a register assigned during zero initialization. This works most often because at the end up block, we would record that the variable is live on stack (since writeThru=true). But if there was another variable in the same block that needs same register (eg. ABI requirement of FixedRegister), we would go ahead and assigning the same register that was earlier assigned to the zero initialized EH var causing the assert.
This is very uncommon situation and hence we haven't seen it in our test. For this to happen, we need to have an EH var which is not a param and is live-in inside the first block. In such situations, we would insert zero initiazation refposition. Next, there has to be another variable in the same block that needs same register which will trigger this issue.

There are 2 options to fix this problem:

  1. We can be consistent in our check of zero-initialization here and use same condition that we use in LSRA, but by doing that we miss lot of local variables from getting enregistered and benefit from this optimization. In Enregister EH var that are single def #47307, I tried it out and saw lot of regressions because of missed oppportunity. I again gathered those asmdiff numbers and the result is same, lot of regressions.
  2. Another and better option is to not assign register for a RefTypeZeroInit refposition if the corresponding interval is writeThru. It works since we will be zero initialization the variable on stack and hence there is no point in assign a register at the zero-initialization point and not getting utilized. Once we delay the register allocation for this scenario, it opens an opportunity for other better and important refPositions to consume that register and giving us better code quality. I am see some good performance improvements by doing this. I did analyze some of the regressions and they happen for similar reason. We would allocate a high demanding register to a variable which gets spilled because other varibles need them. Earlier, this would have simply got assigned to the variable that was zero-initialized EH-write thru.

CodeSize/PerfScore Improvements: https://gist.github.com/kunalspathak/ce03a767cb67bf1e3311edbc099e7731

Fixes: #58539

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 4, 2021
@ghost
Copy link

ghost commented Sep 4, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Problem:
We assign a register to a variable that has zero initialization, but in the same block if there is another variable that needs the same register (eg. ABI requirement of FixedRegister), we would go ahead and assigning it although the earlier zeroInit variable was live into it. This happens because in the same block, we will only spill the variable at the end of that block and until then, the register stays assigned to the zero initialized variable.

We have enabled EH Write thru only for single-def variable. When we think that there could be a zero initialization, it technically means multiple definition and we bailout the EH write thru optimization. However, as seen in this comment, the condition to check if we will zero-init a variable or not is different during liveness vs. what we use in LSRA. Because of that, we ended up enregistering an EH var which got a register assigned during zero initialization. This works most often because at the end up block, we would record that the variable is live on stack (since writeThru=true). But if there was another variable in the same block that needs same register (eg. ABI requirement of FixedRegister), we would go ahead and assigning the same register that was earlier assigned to the zero initialized EH var causing the assert.
This is very uncommon situation and hence we haven't seen it in our test. For this to happen, we need to have an EH var which is not a param and is live-in inside the first block. In such situations, we would insert zero initiazation refposition. Next, there has to be another variable in the same block that needs same register which will trigger this issue.

There are 2 options to fix this problem:

  1. We can be consistent in our check of zero-initialization here and use same condition that we use in LSRA, but by doing that we miss lot of local variables from getting enregistered and benefit from this optimization. In Enregister EH var that are single def #47307, I tried it out and saw lot of regressions because of missed oppportunity. I again gathered those asmdiff numbers and the result is same, lot of regressions.
  2. Another and better option is to not assign register for a RefTypeZeroInit refposition if the corresponding interval is writeThru. It works since we will be zero initialization the variable on stack and hence there is no point in assign a register at the zero-initialization point and not getting utilized. Once we delay the register allocation for this scenario, it opens an opportunity for other better and important refPositions to consume that register and giving us better code quality. I am see some good performance improvements by doing this. I did analyze some of the regressions and they happen for similar reason. We would allocate a high demanding register to a variable which gets spilled because other varibles need them. Earlier, this would have simply got assigned to the variable that was zero-initialized EH-write thru.
Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Contributor Author

I need to trigger superpmi replay pipeline but it looks like superpmi collections are outdated, so I have triggered the colllection pipeline after which I will trigger superpmi-replay pipeline.

@kunalspathak kunalspathak marked this pull request as ready for review September 4, 2021 17:22
@kunalspathak
Copy link
Contributor Author

@dotnet/jit-contrib

@kunalspathak kunalspathak changed the title Zeroinit delay alloc Skip allocation for ZeroInit writeThru intervals Sep 7, 2021
@kunalspathak
Copy link
Contributor Author

I tried to craft a test case, but it doesn't hit the circumstances under which this issue is triggered. Perhaps, once we decide if we want to be consistent here or not, we can add an assert in lsra to make sure that we are only doing writeThru for cases that doesn't need zero initialization.

@kunalspathak kunalspathak merged commit 5c479bb into dotnet:main Sep 7, 2021
@kunalspathak
Copy link
Contributor Author

/backport to release/6.0

@kunalspathak kunalspathak deleted the zeroinit-delay-alloc branch September 7, 2021 20:32
@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2021

Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1210903846

@ghost ghost locked as resolved and limited conversation to collaborators Oct 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Assertion failed '(gcInfo.gcRegGCrefSetCur & (RBM_CALLEE_TRASH & ~RBM_ARG_REGS)) == 0'

2 participants