JIT: Shrink data section for const vector loads #114040

saucecontrol · 2025-03-29T20:58:45Z

Noticed this while looking at the codegen for floating->integral casts where we use vfixupimms[sd], which is a SIMD scalar instruction that takes a control table as a vector. The table value for that instruction is almost always const, and it's the operand that supports memory load, so we end up emitting a full SIMD16 vector to the data section for what ends up being a DWORD or QWORD reloc. This change identifies vector constants contained by instructions that read only a scalar and shrinks the emitted data section value down to the scalar size.

We were also emitting full vectors to the data section for uncontained CNS_VEC to be loaded to register for use. #92017 had previously addressed compressing large vectors down to SIMD16 broadcast when they were immediately consumed by STOREIND. This PR does compression down as far as 4-byte scalar, and does it for all uncontained CNS_VEC loads.

Diffs

dotnet-policy-service · 2025-03-29T20:59:19Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/jit/gentree.cpp

saucecontrol

@tannergooding I've updated this based on the approach we discussed on Discord, including using broadcast loads where possible.

Latest diffs look good. It's overall a small code size regression, but generally we're trading +1 byte of code for -8 or more bytes of data.

saucecontrol · 2025-04-02T16:49:19Z

src/coreclr/jit/gentree.cpp

-                case NI_AVX2_BroadcastVector128ToVector256:
-                case NI_AVX512F_BroadcastVector128ToVector512:
-                case NI_AVX512F_BroadcastVector256ToVector512:
                    if (GetAuxiliaryJitType() == CORINFO_TYPE_PTR)


Most of these deletions are reverting questionable changes from #92017, which attempted to solve the same problem, but only for uncontained CNS_VEC directly consumed by STOREIND. Those cases are handled better by this PR.

saucecontrol · 2025-04-02T16:50:44Z

src/coreclr/jit/simd.h

        case TYP_DOUBLE:
        {
-            result->f64[arg1] = static_cast<float>(arg2);
+            result->f64[arg1] = arg2;


This, and the change in valuenum.cpp are unrelated things I noticed while reading how the SIMD constant folding works.

tannergooding · 2025-04-02T17:01:28Z

It's overall a small code size regression, but generally we're trading +1 byte of code for -8 or more bytes of data.

Yep, that's pretty typical. I'd actually like if we had a way for SPMI to track the diff of the method local data section as well to help account for such changes. Possibly @EgorBo or @jakobbotsch have ideas on if that's feasible.

src/coreclr/jit/emit.cpp

tannergooding

CC. @dotnet/jit-contrib, @EgorBo, @jakobbotsch for secondary review

tannergooding · 2025-04-28T18:43:57Z

Ping @dotnet/jit-contrib, @EgorBo, @jakobbotsch for secondary review

BruceForstall · 2025-04-29T19:01:59Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2025-04-29T19:02:05Z

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

shrink data for scalar instructions with contained const vectors

8e58152

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 29, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 29, 2025

saucecontrol marked this pull request as ready for review March 30, 2025 00:01

saucecontrol added 2 commits March 30, 2025 23:39

allow CreateScalarUnsafe to remain scalar const

8d23373

formatting

dfa3ea6

tannergooding reviewed Mar 31, 2025

View reviewed changes

src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved

saucecontrol added 6 commits March 31, 2025 15:29

revert CreateScalarUnsafe changes

eee8b5a

shrink constants in genSetRegToConst

3810aba

fix build

3632656

fix throughput regression

37f0a7d

Merge remote-tracking branch 'upstream/main' into shrink-relocs

d0d5b30

use correct instruction for SIMD8

2c269a3

build-analysis bot mentioned this pull request Apr 2, 2025

[QUIC & HTTP/3] Handshake Timeout on tests #104426

Closed

saucecontrol commented Apr 2, 2025

View reviewed changes

saucecontrol changed the title ~~JIT: Shrink data for scalar instructions with contained const vectors~~ JIT: Shrink data section for const vector loads Apr 2, 2025