-
Notifications
You must be signed in to change notification settings - Fork 5.2k
JIT: Shrink data section for const vector loads #114040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding I've updated this based on the approach we discussed on Discord, including using broadcast loads where possible.
Latest diffs look good. It's overall a small code size regression, but generally we're trading +1 byte of code for -8 or more bytes of data.
case NI_AVX2_BroadcastVector128ToVector256: | ||
case NI_AVX512F_BroadcastVector128ToVector512: | ||
case NI_AVX512F_BroadcastVector256ToVector512: | ||
if (GetAuxiliaryJitType() == CORINFO_TYPE_PTR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these deletions are reverting questionable changes from #92017, which attempted to solve the same problem, but only for uncontained CNS_VEC directly consumed by STOREIND. Those cases are handled better by this PR.
case TYP_DOUBLE: | ||
{ | ||
result->f64[arg1] = static_cast<float>(arg2); | ||
result->f64[arg1] = arg2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, and the change in valuenum.cpp are unrelated things I noticed while reading how the SIMD constant folding works.
Yep, that's pretty typical. I'd actually like if we had a way for SPMI to track the diff of the method local data section as well to help account for such changes. Possibly @EgorBo or @jakobbotsch have ideas on if that's feasible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC. @dotnet/jit-contrib, @EgorBo, @jakobbotsch for secondary review
Ping @dotnet/jit-contrib, @EgorBo, @jakobbotsch for secondary review |
/azp run runtime-coreclr outerloop |
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
Noticed this while looking at the codegen for floating->integral casts where we use
vfixupimms[sd]
, which is a SIMD scalar instruction that takes a control table as a vector. The table value for that instruction is almost always const, and it's the operand that supports memory load, so we end up emitting a full SIMD16 vector to the data section for what ends up being a DWORD or QWORD reloc. This change identifies vector constants contained by instructions that read only a scalar and shrinks the emitted data section value down to the scalar size.We were also emitting full vectors to the data section for uncontained CNS_VEC to be loaded to register for use. #92017 had previously addressed compressing large vectors down to SIMD16 broadcast when they were immediately consumed by STOREIND. This PR does compression down as far as 4-byte scalar, and does it for all uncontained CNS_VEC loads.
Diffs