Skip to content

Arm64: Evaluate if it is possible to combine subsequent field loads in a single load #64815

Closed
@kunalspathak

Description

@kunalspathak

Evaluate to see how feasible it would be to combine loads of subsequent fields using ldp instead of loading them separately during the use. It cannot be considered as a peep-hole optimization, but an analysis is needed in earlier phases to check around for consecutive field loads and if found one, combine them to a single load.

class Body { public double x, y, z, vx, vy, vz, mass; }
...
foreach (var b in bodies) {
    b.x += dt * b.vx; b.y += dt * b.vy; b.z += dt * b.vz;
}

Below code is generated for the loop that deals with multiplication of double.

  • #1 can be combined into ldp dX, dY, [x3, #8]
  • #2 can be combined into ldp dX, dY, [x3, #32]
  • #3 can be combined into stp dX, dY, [x3, #8]
G_M56457_IG05:              ;; offset=0114H
        D37D7C43          ubfiz   x3, x2, #3, #32
        91004063          add     x3, x3, #16
        F8636803          ldr     x3, [x0, x3]
        FD400470          ldr     d16, [x3,#8]   ; <-- #1
        FD401071          ldr     d17, [x3,#32]  ; <-- #2
        1E710811          fmul    d17, d0, d17
        1E712A10          fadd    d16, d16, d17
        FD000470          str     d16, [x3,#8]   ; <-- #3
        FD400870          ldr     d16, [x3,#16]  ; <-- #1
        FD401471          ldr     d17, [x3,#40]  ; <-- #2
        1E710811          fmul    d17, d0, d17
        1E712A10          fadd    d16, d16, d17
        FD000870          str     d16, [x3,#16]  ; <-- #3
        FD400C70          ldr     d16, [x3,#24]
        FD401871          ldr     d17, [x3,#48]
        1E710811          fmul    d17, d0, d17
        1E712A10          fadd    d16, d16, d17
        FD000C70          str     d16, [x3,#24]
        11000442          add     w2, w2, #1
        6B02003F          cmp     w1, w2
        54FFFD8C          bgt     G_M56457_IG05

Reference: https://godbolt.org/z/9jY5hYnoa

category:implementation
theme:codegen
skill-level:intermediate
cost:medium
impact:medium

Metadata

Metadata

Assignees

Labels

Priority:3Work that is nice to havearch-arm64area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIneeds-further-triageIssue has been initially triaged, but needs deeper consideration or reconsiderationoptimization

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions