Skip to content

Conversation

saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Mar 29, 2025

Noticed this while looking at the codegen for floating->integral casts where we use vfixupimms[sd], which is a SIMD scalar instruction that takes a control table as a vector. The table value for that instruction is almost always const, and it's the operand that supports memory load, so we end up emitting a full SIMD16 vector to the data section for what ends up being a DWORD or QWORD reloc. This change identifies vector constants contained by instructions that read only a scalar and shrinks the emitted data section value down to the scalar size.

We were also emitting full vectors to the data section for uncontained CNS_VEC to be loaded to register for use. #92017 had previously addressed compressing large vectors down to SIMD16 broadcast when they were immediately consumed by STOREIND. This PR does compression down as far as 4-byte scalar, and does it for all uncontained CNS_VEC loads.

Diffs

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 29, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 29, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Member Author

@saucecontrol saucecontrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding I've updated this based on the approach we discussed on Discord, including using broadcast loads where possible.

Latest diffs look good. It's overall a small code size regression, but generally we're trading +1 byte of code for -8 or more bytes of data.

case NI_AVX2_BroadcastVector128ToVector256:
case NI_AVX512F_BroadcastVector128ToVector512:
case NI_AVX512F_BroadcastVector256ToVector512:
if (GetAuxiliaryJitType() == CORINFO_TYPE_PTR)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these deletions are reverting questionable changes from #92017, which attempted to solve the same problem, but only for uncontained CNS_VEC directly consumed by STOREIND. Those cases are handled better by this PR.

case TYP_DOUBLE:
{
result->f64[arg1] = static_cast<float>(arg2);
result->f64[arg1] = arg2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and the change in valuenum.cpp are unrelated things I noticed while reading how the SIMD constant folding works.

@saucecontrol saucecontrol changed the title JIT: Shrink data for scalar instructions with contained const vectors JIT: Shrink data section for const vector loads Apr 2, 2025
@tannergooding
Copy link
Member

It's overall a small code size regression, but generally we're trading +1 byte of code for -8 or more bytes of data.

Yep, that's pretty typical. I'd actually like if we had a way for SPMI to track the diff of the method local data section as well to help account for such changes. Possibly @EgorBo or @jakobbotsch have ideas on if that's feasible.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC. @dotnet/jit-contrib, @EgorBo, @jakobbotsch for secondary review

@tannergooding
Copy link
Member

Ping @dotnet/jit-contrib, @EgorBo, @jakobbotsch for secondary review

@BruceForstall
Copy link
Contributor

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@BruceForstall BruceForstall merged commit 8b00880 into dotnet:main May 2, 2025
105 of 108 checks passed
@saucecontrol saucecontrol deleted the shrink-relocs branch May 4, 2025 15:20
@github-actions github-actions bot locked and limited conversation to collaborators Jun 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants