Skip to content

[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 #112403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2024

Conversation

shiltian
Copy link
Contributor

No description provided.

Copy link
Contributor Author

shiltian commented Oct 15, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @shiltian and the rest of your teammates on Graphite Graphite

@shiltian shiltian force-pushed the users/shiltian/queue-ptr-cov5 branch from e5bdbf6 to 8e6bdab Compare October 15, 2024 20:01
@shiltian shiltian changed the base branch from main to users/shiltian/autogen-andorbitset October 15, 2024 20:01
@shiltian shiltian requested a review from arsenm October 15, 2024 20:03
@shiltian shiltian force-pushed the users/shiltian/autogen-andorbitset branch from 28df934 to e35360a Compare October 15, 2024 20:07
@shiltian shiltian force-pushed the users/shiltian/queue-ptr-cov5 branch from 8e6bdab to de01250 Compare October 15, 2024 20:08
@shiltian shiltian force-pushed the users/shiltian/autogen-andorbitset branch from e35360a to 700bdb6 Compare October 15, 2024 20:22
@shiltian shiltian force-pushed the users/shiltian/queue-ptr-cov5 branch from de01250 to 8c9fc97 Compare October 15, 2024 20:22
@llvmbot
Copy link
Member

llvmbot commented Oct 16, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

Patch is 26.04 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112403.diff

538 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp (+1-3)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/addsubu64.ll (+22-22)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll (+102-182)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmin.ll (+102-182)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_udec_wrap.ll (+225-225)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_uinc_wrap.ll (+240-240)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/bool-legalization.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/crash-stack-address-O0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll (+44-44)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/dropped_debug_info_assert.ll (+29-28)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-uniform.ll (+15-15)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll (+68-68)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll (+158-158)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fmamix-constant-bus-violation.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fp-atomics-gfx940.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll (+346-346)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll (+314-314)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/function-returns.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll (+23-20)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.large.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll (+236-236)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-assert-align.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-atomicrmw.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-abi-attribute-hints.ll (+55-53)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll (+484-468)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll (+1625-1580)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll (+36-35)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll (+2156-2092)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constant-fold-vector-op.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-fence.ll (+80-80)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll (+32-31)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll (+51-51)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-tail-call.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/lds-global-value.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/lds-zero-initializer.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.scale.ll (+376-388)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.end.cf.i32.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.end.cf.i64.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.global.atomic.csub.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.if.break.i32.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.if.break.i64.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll (+87-140)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll (+51-51)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mov.dpp.ll (+15-15)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sbfe.ll (+49-49)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.set.inactive.ll (+43-43)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.trig.preop.ll (+15-15)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ubfe.ll (+66-66)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.update.dpp.ll (+49-49)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/madmix-constant-bus-violation.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll (+111-111)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll (+61-61)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll (+796-796)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/shl-ext-reduce.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/smrd.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/store-local.128.ll (+244-244)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/store-local.96.ll (+242-218)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll (+419-418)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/vni8-across-blocks.ll (+125-125)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/widen-i8-i16-scalar-loads.ll (+94-94)
  • (modified) llvm/test/CodeGen/AMDGPU/add.ll (+396-395)
  • (modified) llvm/test/CodeGen/AMDGPU/add.v2i16.ll (+210-210)
  • (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+53-53)
  • (modified) llvm/test/CodeGen/AMDGPU/always-uniform.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/amd.endpgm.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll (+2850-2855)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-demote-scc-branches.ll (+129-129)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+30-30)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu.work-item-intrinsics.deprecated.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll (+5-5)
  • (added) llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.o ()
  • (modified) llvm/test/CodeGen/AMDGPU/amdpal-elf.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/andorbitset.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/andorxorinvimm.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/anyext.ll (+32-32)
  • (modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll (+608-619)
  • (modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll (+1815-1822)
  • (modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll (+2998-3001)
  • (modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll (+508-518)
  • (modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll (+602-608)
  • (modified) llvm/test/CodeGen/AMDGPU/atomics-hw-remarks-gfx90a.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/atomics_cond_sub.ll (+35-35)
  • (modified) llvm/test/CodeGen/AMDGPU/bf16.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/bfe-combine.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/bfe-patterns.ll (+75-75)
  • (modified) llvm/test/CodeGen/AMDGPU/bfi_int.ll (+232-232)
  • (modified) llvm/test/CodeGen/AMDGPU/bfi_nested.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/bfm.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/bitreverse.ll (+107-107)
  • (modified) llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll (+35-32)
  • (modified) llvm/test/CodeGen/AMDGPU/br_cc.f16.ll (+46-46)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+379-397)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-relaxation.ll (+134-134)
  • (modified) llvm/test/CodeGen/AMDGPU/bswap.ll (+21-21)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll (+930-1626)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll (+521-929)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmin.ll (+521-929)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-rsrc-ptr-ops.ll (+31-31)
  • (modified) llvm/test/CodeGen/AMDGPU/build_vector.ll (+40-40)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+1463-1221)
  • (modified) llvm/test/CodeGen/AMDGPU/call-reqd-group-size.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/call-waitcnt.ll (+47-42)
  • (modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/calling-conventions.ll (+93-87)
  • (modified) llvm/test/CodeGen/AMDGPU/carryout-selection.ll (+916-918)
  • (modified) llvm/test/CodeGen/AMDGPU/cc-update.ll (+169-153)
  • (modified) llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll (+41-41)
  • (modified) llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-gfx1030.ll (+1-2)
  • (modified) llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-gfx908.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/cgp-bitfield-extract.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll (+13-13)
  • (modified) llvm/test/CodeGen/AMDGPU/clamp-modifier.ll (+140-141)
  • (modified) llvm/test/CodeGen/AMDGPU/clamp.ll (+334-334)
  • (modified) llvm/test/CodeGen/AMDGPU/cluster_stores.ll (+48-48)
  • (modified) llvm/test/CodeGen/AMDGPU/code-object-v3.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/collapse-endcf.ll (+21-21)
  • (modified) llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll (+67-67)
  • (modified) llvm/test/CodeGen/AMDGPU/combine-reg-or-const.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/combine-vload-extract.ll (+11-11)
  • (modified) llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll (+67-66)
  • (modified) llvm/test/CodeGen/AMDGPU/copy-to-reg-scc-clobber.ll (+26-26)
  • (modified) llvm/test/CodeGen/AMDGPU/copy_to_scc.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll (+32-28)
  • (modified) llvm/test/CodeGen/AMDGPU/ctlz.ll (+161-161)
  • (modified) llvm/test/CodeGen/AMDGPU/ctlz_zero_undef.ll (+174-174)
  • (modified) llvm/test/CodeGen/AMDGPU/ctpop16.ll (+115-115)
  • (modified) llvm/test/CodeGen/AMDGPU/ctpop64.ll (+146-146)
  • (modified) llvm/test/CodeGen/AMDGPU/cttz.ll (+144-144)
  • (modified) llvm/test/CodeGen/AMDGPU/cttz_zero_undef.ll (+125-125)
  • (modified) llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll (+217-215)
  • (modified) llvm/test/CodeGen/AMDGPU/dag-divergence-atomic.ll (+186-186)
  • (modified) llvm/test/CodeGen/AMDGPU/dag-preserve-disjoint-flag.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcomb-extract-vec-elt-different-sizes.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-setcc-select.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/divergence-driven-buildvector.ll (+88-88)
  • (modified) llvm/test/CodeGen/AMDGPU/divergence-driven-sext-inreg.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/divergence-driven-trunc-to-i1.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/ds-alignment.ll (+45-45)
  • (modified) llvm/test/CodeGen/AMDGPU/ds-combine-large-stride.ll (+13-13)
  • (modified) llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/ds_read2.ll (+135-132)
  • (modified) llvm/test/CodeGen/AMDGPU/ds_write2.ll (+79-79)
  • (modified) llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/exec-mask-opt-cannot-create-empty-or-backward-segment.ll (+5-5)
  • (modified) llvm/test/CodeGen/AMDGPU/expand-scalar-carry-out-select-user.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_dynelt.ll (+376-376)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_elt-f16.ll (+158-161)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_elt-i16.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_elt-i8.ll (+48-48)
  • (modified) llvm/test/CodeGen/AMDGPU/extractelt-to-trunc.ll (+40-40)
  • (modified) llvm/test/CodeGen/AMDGPU/fabs.f16.ll (+77-78)
  • (modified) llvm/test/CodeGen/AMDGPU/fabs.f64.ll (+41-41)
  • (modified) llvm/test/CodeGen/AMDGPU/fabs.ll (+28-28)
  • (modified) llvm/test/CodeGen/AMDGPU/fadd.f16.ll (+130-130)
  • (modified) llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/fcanonicalize.f16.ll (+208-208)
  • (modified) llvm/test/CodeGen/AMDGPU/fcanonicalize.ll (+236-236)
  • (modified) llvm/test/CodeGen/AMDGPU/fcmp.f16.ll (+930-930)
  • (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll (+601-603)
  • (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f32.ll (+224-224)
  • (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f64.ll (+241-241)
  • (modified) llvm/test/CodeGen/AMDGPU/fdiv.f16.ll (+216-216)
  • (modified) llvm/test/CodeGen/AMDGPU/fdiv.ll (+300-301)
  • (modified) llvm/test/CodeGen/AMDGPU/fdiv32-to-rcp-folding.ll (+71-71)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll (+30-28)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-scratch-svs.ll (+54-54)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-scratch.ll (+322-322)
  • (modified) llvm/test/CodeGen/AMDGPU/flat_atomics.ll (+2273-2273)
  • (modified) llvm/test/CodeGen/AMDGPU/flat_atomics_i32_system.ll (+239-239)
  • (modified) llvm/test/CodeGen/AMDGPU/flat_atomics_i64.ll (+1225-1225)
  • (modified) llvm/test/CodeGen/AMDGPU/flat_atomics_i64_system.ll (+135-135)
  • (modified) llvm/test/CodeGen/AMDGPU/fma-combine.ll (+497-497)
  • (modified) llvm/test/CodeGen/AMDGPU/fmax3.ll (+72-72)
  • (modified) llvm/test/CodeGen/AMDGPU/fmax_legacy.f64.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/fmaximum.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/fmed3.ll (+551-551)
  • (modified) llvm/test/CodeGen/AMDGPU/fmin3.ll (+104-108)
  • (modified) llvm/test/CodeGen/AMDGPU/fmin_legacy.f64.ll (+16-16)
  • (modified) llvm/test/CodeGen/AMDGPU/fminimum.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll (+126-126)
  • (modified) llvm/test/CodeGen/AMDGPU/fmul.f16.ll (+222-258)
  • (modified) llvm/test/CodeGen/AMDGPU/fmuladd.f16.ll (+236-236)
  • (modified) llvm/test/CodeGen/AMDGPU/fnearbyint.ll (+90-88)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll (+38-38)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.f16.ll (+89-89)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.f64.ll (+91-91)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+51-51)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll (+22-22)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg.f16.ll (+62-62)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg.ll (+143-143)
  • (modified) llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/fp-classify.ll (+129-129)
  • (modified) llvm/test/CodeGen/AMDGPU/fp-min-max-buffer-atomics.ll (+154-153)
  • (modified) llvm/test/CodeGen/AMDGPU/fp-min-max-buffer-ptr-atomics.ll (+142-141)
  • (modified) llvm/test/CodeGen/AMDGPU/fp16_to_fp32.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/fp16_to_fp64.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/fp32_to_fp16.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/fp64-atomics-gfx90a.ll (+296-296)
  • (modified) llvm/test/CodeGen/AMDGPU/fp64-min-max-buffer-atomics.ll (+114-114)
  • (modified) llvm/test/CodeGen/AMDGPU/fp64-min-max-buffer-ptr-atomics.ll (+114-114)
  • (modified) llvm/test/CodeGen/AMDGPU/fp_to_sint.ll (+64-64)
  • (modified) llvm/test/CodeGen/AMDGPU/fp_to_uint.ll (+48-48)
  • (modified) llvm/test/CodeGen/AMDGPU/fpext.f16.ll (+291-551)
  • (modified) llvm/test/CodeGen/AMDGPU/fptosi.f16.ll (+27-27)
  • (modified) llvm/test/CodeGen/AMDGPU/fptoui.f16.ll (+28-28)
  • (modified) llvm/test/CodeGen/AMDGPU/fptrunc.f16.ll (+222-222)
  • (modified) llvm/test/CodeGen/AMDGPU/fptrunc.ll (+215-215)
  • (modified) llvm/test/CodeGen/AMDGPU/frem.ll (+706-706)
  • (modified) llvm/test/CodeGen/AMDGPU/fshl.ll (+343-343)
  • (modified) llvm/test/CodeGen/AMDGPU/fshr.ll (+236-236)
  • (modified) llvm/test/CodeGen/AMDGPU/fsqrt.f32.ll (+74-74)
  • (modified) llvm/test/CodeGen/AMDGPU/fsub.f16.ll (+156-192)
  • (modified) llvm/test/CodeGen/AMDGPU/function-args-inreg.ll (+574-583)
  • (modified) llvm/test/CodeGen/AMDGPU/fused-bitlogic.ll (+30-30)
  • (modified) llvm/test/CodeGen/AMDGPU/gds-allocation.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/gep-const-address-space.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd-wrong-subtarget.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll (+37-37)
  • (modified) llvm/test/CodeGen/AMDGPU/global-i16-load-store.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/global-load-saddr-to-vaddr.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics.ll (+2360-2360)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_i32_system.ll (+230-230)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_i64.ll (+2117-2117)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_i64_system.ll (+150-150)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+2686-2486)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+1736-1586)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+1736-1586)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+2590-2390)
  • (modified) llvm/test/CodeGen/AMDGPU/greedy-reverse-local-assignment.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/half.ll (+247-247)
  • (modified) llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/idiv-licm.ll (+207-207)
  • (modified) llvm/test/CodeGen/AMDGPU/idot2.ll (+842-823)
  • (modified) llvm/test/CodeGen/AMDGPU/idot4s.ll (+713-710)
  • (modified) llvm/test/CodeGen/AMDGPU/idot4u.ll (+1322-1325)
  • (modified) llvm/test/CodeGen/AMDGPU/idot8s.ll (+366-366)
  • (modified) llvm/test/CodeGen/AMDGPU/idot8u.ll (+492-488)
  • (modified) llvm/test/CodeGen/AMDGPU/imm.ll (+300-300)
  • (modified) llvm/test/CodeGen/AMDGPU/imm16.ll (+244-244)
  • (modified) llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll (+23-21)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll (+1202-1203)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-call-known-callees.ll (+35-33)
  • (modified) llvm/test/CodeGen/AMDGPU/infinite-loop.ll (+4)
  • (modified) llvm/test/CodeGen/AMDGPU/inline-asm.i128.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+64-62)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll (+1082-1081)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll (+819-819)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2bf16.ll (+261-253)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll (+371-386)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_waitcnt_for_precise_memory.ll (+157-157)
  • (modified) llvm/test/CodeGen/AMDGPU/kernel-args.ll ()
  • (modified) llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll ()
  • (modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll ()
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.cond.sub.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pkrtz.ll (+193-193)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.exp.row.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w32.ll (+532-532)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w64.ll (+510-510)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.bf16.bf16.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f16.f16.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.atomic.ordered.add.b64.ll (+13-14)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
index 351e9f25e29cfc..3ff3cc26153964 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
@@ -472,9 +472,7 @@ static void allocateHSAUserSGPRs(CCState &CCInfo,
     CCInfo.AllocateReg(DispatchPtrReg);
   }
 
-  const Module *M = MF.getFunction().getParent();
-  if (UserSGPRInfo.hasQueuePtr() &&
-      AMDGPU::getAMDHSACodeObjectVersion(*M) < AMDGPU::AMDHSA_COV5) {
+  if (UserSGPRInfo.hasQueuePtr()) {
     Register QueuePtrReg = Info.addQueuePtr(TRI);
     MF.addLiveIn(QueuePtrReg, &AMDGPU::SGPR_64RegClass);
     CCInfo.AllocateReg(QueuePtrReg);
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 8c197f23149612..91778223bc79f0 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -2393,9 +2393,7 @@ void SITargetLowering::allocateSpecialInputSGPRs(
   if (UserSGPRInfo.hasDispatchPtr())
     allocateSGPR64Input(CCInfo, ArgInfo.DispatchPtr);
 
-  const Module *M = MF.getFunction().getParent();
-  if (UserSGPRInfo.hasQueuePtr() &&
-      AMDGPU::getAMDHSACodeObjectVersion(*M) < AMDGPU::AMDHSA_COV5)
+  if (UserSGPRInfo.hasQueuePtr())
     allocateSGPR64Input(CCInfo, ArgInfo.QueuePtr);
 
   // Implicit arg ptr takes the place of the kernarg segment pointer. This is a
@@ -2446,9 +2444,7 @@ void SITargetLowering::allocateHSAUserSGPRs(CCState &CCInfo,
     CCInfo.AllocateReg(DispatchPtrReg);
   }
 
-  const Module *M = MF.getFunction().getParent();
-  if (UserSGPRInfo.hasQueuePtr() &&
-      AMDGPU::getAMDHSACodeObjectVersion(*M) < AMDGPU::AMDHSA_COV5) {
+  if (UserSGPRInfo.hasQueuePtr()) {
     Register QueuePtrReg = Info.addQueuePtr(TRI);
     MF.addLiveIn(QueuePtrReg, &AMDGPU::SGPR_64RegClass);
     CCInfo.AllocateReg(QueuePtrReg);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/addsubu64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/addsubu64.ll
index 359c1e53de99e3..4345fa96da8c88 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/addsubu64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/addsubu64.ll
@@ -6,15 +6,15 @@ define amdgpu_kernel void @s_add_u64(ptr addrspace(1) %out, i64 %a, i64 %b) {
 ; GFX11-LABEL: s_add_u64:
 ; GFX11:       ; %bb.0: ; %entry
 ; GFX11-NEXT:    s_clause 0x1
-; GFX11-NEXT:    s_load_b128 s[4:7], s[2:3], 0x24
-; GFX11-NEXT:    s_load_b64 s[0:1], s[2:3], 0x34
+; GFX11-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
+; GFX11-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
 ; GFX11-NEXT:    v_mov_b32_e32 v2, 0
 ; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-NEXT:    s_add_u32 s0, s6, s0
-; GFX11-NEXT:    s_addc_u32 s1, s7, s1
+; GFX11-NEXT:    s_add_u32 s2, s2, s4
+; GFX11-NEXT:    s_addc_u32 s3, s3, s5
 ; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX11-NEXT:    v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
-; GFX11-NEXT:    global_store_b64 v2, v[0:1], s[4:5]
+; GFX11-NEXT:    v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s3
+; GFX11-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX11-NEXT:    s_nop 0
 ; GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
 ; GFX11-NEXT:    s_endpgm
@@ -22,14 +22,14 @@ define amdgpu_kernel void @s_add_u64(ptr addrspace(1) %out, i64 %a, i64 %b) {
 ; GFX12-LABEL: s_add_u64:
 ; GFX12:       ; %bb.0: ; %entry
 ; GFX12-NEXT:    s_clause 0x1
-; GFX12-NEXT:    s_load_b128 s[4:7], s[2:3], 0x24
-; GFX12-NEXT:    s_load_b64 s[0:1], s[2:3], 0x34
+; GFX12-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
+; GFX12-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
 ; GFX12-NEXT:    v_mov_b32_e32 v2, 0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    s_add_nc_u64 s[0:1], s[6:7], s[0:1]
+; GFX12-NEXT:    s_add_nc_u64 s[2:3], s[2:3], s[4:5]
 ; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX12-NEXT:    v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
-; GFX12-NEXT:    global_store_b64 v2, v[0:1], s[4:5]
+; GFX12-NEXT:    v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s3
+; GFX12-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX12-NEXT:    s_nop 0
 ; GFX12-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
 ; GFX12-NEXT:    s_endpgm
@@ -58,15 +58,15 @@ define amdgpu_kernel void @s_sub_u64(ptr addrspace(1) %out, i64 %a, i64 %b) {
 ; GFX11-LABEL: s_sub_u64:
 ; GFX11:       ; %bb.0: ; %entry
 ; GFX11-NEXT:    s_clause 0x1
-; GFX11-NEXT:    s_load_b128 s[4:7], s[2:3], 0x24
-; GFX11-NEXT:    s_load_b64 s[0:1], s[2:3], 0x34
+; GFX11-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
+; GFX11-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
 ; GFX11-NEXT:    v_mov_b32_e32 v2, 0
 ; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-NEXT:    s_sub_u32 s0, s6, s0
-; GFX11-NEXT:    s_subb_u32 s1, s7, s1
+; GFX11-NEXT:    s_sub_u32 s2, s2, s4
+; GFX11-NEXT:    s_subb_u32 s3, s3, s5
 ; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX11-NEXT:    v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
-; GFX11-NEXT:    global_store_b64 v2, v[0:1], s[4:5]
+; GFX11-NEXT:    v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s3
+; GFX11-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX11-NEXT:    s_nop 0
 ; GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
 ; GFX11-NEXT:    s_endpgm
@@ -74,14 +74,14 @@ define amdgpu_kernel void @s_sub_u64(ptr addrspace(1) %out, i64 %a, i64 %b) {
 ; GFX12-LABEL: s_sub_u64:
 ; GFX12:       ; %bb.0: ; %entry
 ; GFX12-NEXT:    s_clause 0x1
-; GFX12-NEXT:    s_load_b128 s[4:7], s[2:3], 0x24
-; GFX12-NEXT:    s_load_b64 s[0:1], s[2:3], 0x34
+; GFX12-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
+; GFX12-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
 ; GFX12-NEXT:    v_mov_b32_e32 v2, 0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    s_sub_nc_u64 s[0:1], s[6:7], s[0:1]
+; GFX12-NEXT:    s_sub_nc_u64 s[2:3], s[2:3], s[4:5]
 ; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX12-NEXT:    v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
-; GFX12-NEXT:    global_store_b64 v2, v[0:1], s[4:5]
+; GFX12-NEXT:    v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s3
+; GFX12-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX12-NEXT:    s_nop 0
 ; GFX12-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
 ; GFX12-NEXT:    s_endpgm
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
index 43266554c2d8a6..382415f5653e4e 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
@@ -1494,7 +1494,7 @@ define float @buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_m
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_mov_b32_e32 v1, s6
+; GFX12-NEXT:    v_mov_b32_e32 v1, s16
 ; GFX12-NEXT:    s_wait_storecnt 0x0
 ; GFX12-NEXT:    buffer_atomic_max_num_f32 v0, v1, s[0:3], null offen th:TH_ATOMIC_RETURN
 ; GFX12-NEXT:    s_wait_loadcnt 0x0
@@ -1504,7 +1504,7 @@ define float @buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_m
 ; GFX940-LABEL: buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory:
 ; GFX940:       ; %bb.0:
 ; GFX940-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX940-NEXT:    v_mov_b32_e32 v2, s6
+; GFX940-NEXT:    v_mov_b32_e32 v2, s16
 ; GFX940-NEXT:    v_mov_b32_e32 v1, v0
 ; GFX940-NEXT:    buffer_load_dword v0, v2, s[0:3], 0 offen
 ; GFX940-NEXT:    s_mov_b64 s[4:5], 0
@@ -1531,7 +1531,7 @@ define float @buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_m
 ; GFX11-LABEL: buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_mov_b32_e32 v1, s6
+; GFX11-NEXT:    v_mov_b32_e32 v1, s16
 ; GFX11-NEXT:    s_waitcnt_vscnt null, 0x0
 ; GFX11-NEXT:    buffer_atomic_max_f32 v0, v1, s[0:3], 0 offen glc
 ; GFX11-NEXT:    s_waitcnt vmcnt(0)
@@ -1542,13 +1542,9 @@ define float @buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_m
 ; GFX10-LABEL: buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory:
 ; GFX10:       ; %bb.0:
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    v_mov_b32_e32 v1, s18
-; GFX10-NEXT:    s_mov_b32 s4, s6
-; GFX10-NEXT:    s_mov_b32 s5, s7
-; GFX10-NEXT:    s_mov_b32 s6, s16
-; GFX10-NEXT:    s_mov_b32 s7, s17
+; GFX10-NEXT:    v_mov_b32_e32 v1, s20
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0
-; GFX10-NEXT:    buffer_atomic_fmax v0, v1, s[4:7], 0 offen glc
+; GFX10-NEXT:    buffer_atomic_fmax v0, v1, s[16:19], 0 offen glc
 ; GFX10-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10-NEXT:    buffer_gl1_inv
 ; GFX10-NEXT:    buffer_gl0_inv
@@ -1557,14 +1553,10 @@ define float @buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_m
 ; GFX90A-LABEL: buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory:
 ; GFX90A:       ; %bb.0:
 ; GFX90A-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX90A-NEXT:    s_mov_b32 s4, s6
-; GFX90A-NEXT:    s_mov_b32 s5, s7
-; GFX90A-NEXT:    s_mov_b32 s6, s16
-; GFX90A-NEXT:    s_mov_b32 s7, s17
-; GFX90A-NEXT:    v_mov_b32_e32 v2, s18
+; GFX90A-NEXT:    v_mov_b32_e32 v2, s20
 ; GFX90A-NEXT:    v_mov_b32_e32 v1, v0
-; GFX90A-NEXT:    buffer_load_dword v0, v2, s[4:7], 0 offen
-; GFX90A-NEXT:    s_mov_b64 s[8:9], 0
+; GFX90A-NEXT:    buffer_load_dword v0, v2, s[16:19], 0 offen
+; GFX90A-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX90A-NEXT:    v_max_f32_e32 v3, v1, v1
 ; GFX90A-NEXT:  .LBB12_1: ; %atomicrmw.start
 ; GFX90A-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -1573,28 +1565,24 @@ define float @buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_m
 ; GFX90A-NEXT:    v_max_f32_e32 v0, v5, v5
 ; GFX90A-NEXT:    v_max_f32_e32 v4, v0, v3
 ; GFX90A-NEXT:    v_pk_mov_b32 v[0:1], v[4:5], v[4:5] op_sel:[0,1]
-; GFX90A-NEXT:    buffer_atomic_cmpswap v[0:1], v2, s[4:7], 0 offen glc
+; GFX90A-NEXT:    buffer_atomic_cmpswap v[0:1], v2, s[16:19], 0 offen glc
 ; GFX90A-NEXT:    s_waitcnt vmcnt(0)
 ; GFX90A-NEXT:    buffer_wbinvl1
 ; GFX90A-NEXT:    v_cmp_eq_u32_e32 vcc, v0, v5
-; GFX90A-NEXT:    s_or_b64 s[8:9], vcc, s[8:9]
-; GFX90A-NEXT:    s_andn2_b64 exec, exec, s[8:9]
+; GFX90A-NEXT:    s_or_b64 s[4:5], vcc, s[4:5]
+; GFX90A-NEXT:    s_andn2_b64 exec, exec, s[4:5]
 ; GFX90A-NEXT:    s_cbranch_execnz .LBB12_1
 ; GFX90A-NEXT:  ; %bb.2: ; %atomicrmw.end
-; GFX90A-NEXT:    s_or_b64 exec, exec, s[8:9]
+; GFX90A-NEXT:    s_or_b64 exec, exec, s[4:5]
 ; GFX90A-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX908-LABEL: buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory:
 ; GFX908:       ; %bb.0:
 ; GFX908-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX908-NEXT:    s_mov_b32 s4, s6
-; GFX908-NEXT:    s_mov_b32 s5, s7
-; GFX908-NEXT:    s_mov_b32 s6, s16
-; GFX908-NEXT:    s_mov_b32 s7, s17
-; GFX908-NEXT:    v_mov_b32_e32 v2, s18
+; GFX908-NEXT:    v_mov_b32_e32 v2, s20
 ; GFX908-NEXT:    v_mov_b32_e32 v1, v0
-; GFX908-NEXT:    buffer_load_dword v0, v2, s[4:7], 0 offen
-; GFX908-NEXT:    s_mov_b64 s[8:9], 0
+; GFX908-NEXT:    buffer_load_dword v0, v2, s[16:19], 0 offen
+; GFX908-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX908-NEXT:    v_max_f32_e32 v3, v1, v1
 ; GFX908-NEXT:  .LBB12_1: ; %atomicrmw.start
 ; GFX908-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -1604,28 +1592,24 @@ define float @buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_m
 ; GFX908-NEXT:    v_max_f32_e32 v4, v0, v3
 ; GFX908-NEXT:    v_mov_b32_e32 v0, v4
 ; GFX908-NEXT:    v_mov_b32_e32 v1, v5
-; GFX908-NEXT:    buffer_atomic_cmpswap v[0:1], v2, s[4:7], 0 offen glc
+; GFX908-NEXT:    buffer_atomic_cmpswap v[0:1], v2, s[16:19], 0 offen glc
 ; GFX908-NEXT:    s_waitcnt vmcnt(0)
 ; GFX908-NEXT:    buffer_wbinvl1
 ; GFX908-NEXT:    v_cmp_eq_u32_e32 vcc, v0, v5
-; GFX908-NEXT:    s_or_b64 s[8:9], vcc, s[8:9]
-; GFX908-NEXT:    s_andn2_b64 exec, exec, s[8:9]
+; GFX908-NEXT:    s_or_b64 s[4:5], vcc, s[4:5]
+; GFX908-NEXT:    s_andn2_b64 exec, exec, s[4:5]
 ; GFX908-NEXT:    s_cbranch_execnz .LBB12_1
 ; GFX908-NEXT:  ; %bb.2: ; %atomicrmw.end
-; GFX908-NEXT:    s_or_b64 exec, exec, s[8:9]
+; GFX908-NEXT:    s_or_b64 exec, exec, s[4:5]
 ; GFX908-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    s_mov_b32 s4, s6
-; GFX8-NEXT:    s_mov_b32 s5, s7
-; GFX8-NEXT:    s_mov_b32 s6, s16
-; GFX8-NEXT:    s_mov_b32 s7, s17
-; GFX8-NEXT:    v_mov_b32_e32 v2, s18
+; GFX8-NEXT:    v_mov_b32_e32 v2, s20
 ; GFX8-NEXT:    v_mov_b32_e32 v1, v0
-; GFX8-NEXT:    buffer_load_dword v0, v2, s[4:7], 0 offen
-; GFX8-NEXT:    s_mov_b64 s[8:9], 0
+; GFX8-NEXT:    buffer_load_dword v0, v2, s[16:19], 0 offen
+; GFX8-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX8-NEXT:    v_mul_f32_e32 v3, 1.0, v1
 ; GFX8-NEXT:  .LBB12_1: ; %atomicrmw.start
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -1635,26 +1619,22 @@ define float @buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_m
 ; GFX8-NEXT:    v_max_f32_e32 v4, v0, v3
 ; GFX8-NEXT:    v_mov_b32_e32 v0, v4
 ; GFX8-NEXT:    v_mov_b32_e32 v1, v5
-; GFX8-NEXT:    buffer_atomic_cmpswap v[0:1], v2, s[4:7], 0 offen glc
+; GFX8-NEXT:    buffer_atomic_cmpswap v[0:1], v2, s[16:19], 0 offen glc
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    buffer_wbinvl1
 ; GFX8-NEXT:    v_cmp_eq_u32_e32 vcc, v0, v5
-; GFX8-NEXT:    s_or_b64 s[8:9], vcc, s[8:9]
-; GFX8-NEXT:    s_andn2_b64 exec, exec, s[8:9]
+; GFX8-NEXT:    s_or_b64 s[4:5], vcc, s[4:5]
+; GFX8-NEXT:    s_andn2_b64 exec, exec, s[4:5]
 ; GFX8-NEXT:    s_cbranch_execnz .LBB12_1
 ; GFX8-NEXT:  ; %bb.2: ; %atomicrmw.end
-; GFX8-NEXT:    s_or_b64 exec, exec, s[8:9]
+; GFX8-NEXT:    s_or_b64 exec, exec, s[4:5]
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX7-LABEL: buffer_fat_ptr_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    s_mov_b32 s4, s6
-; GFX7-NEXT:    s_mov_b32 s5, s7
-; GFX7-NEXT:    s_mov_b32 s6, s16
-; GFX7-NEXT:    s_mov_b32 s7, s17
-; GFX7-NEXT:    v_mov_b32_e32 v1, s18
-; GFX7-NEXT:    buffer_atomic_fmax v0, v1, s[4:7], 0 offen glc
+; GFX7-NEXT:    v_mov_b32_e32 v1, s20
+; GFX7-NEXT:    buffer_atomic_fmax v0, v1, s[16:19], 0 offen glc
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
 ; GFX7-NEXT:    buffer_wbinvl1
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
@@ -1670,7 +1650,7 @@ define void @buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_mov_b32_e32 v1, s6
+; GFX12-NEXT:    v_mov_b32_e32 v1, s16
 ; GFX12-NEXT:    s_wait_storecnt 0x0
 ; GFX12-NEXT:    buffer_atomic_max_num_f32 v0, v1, s[0:3], null offen
 ; GFX12-NEXT:    s_wait_storecnt 0x0
@@ -1680,7 +1660,7 @@ define void @buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_
 ; GFX940-LABEL: buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
 ; GFX940:       ; %bb.0:
 ; GFX940-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX940-NEXT:    v_mov_b32_e32 v2, s6
+; GFX940-NEXT:    v_mov_b32_e32 v2, s16
 ; GFX940-NEXT:    buffer_load_dword v1, v2, s[0:3], 0 offen
 ; GFX940-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX940-NEXT:    v_max_f32_e32 v3, v0, v0
@@ -1706,7 +1686,7 @@ define void @buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_
 ; GFX11-LABEL: buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_mov_b32_e32 v1, s6
+; GFX11-NEXT:    v_mov_b32_e32 v1, s16
 ; GFX11-NEXT:    s_waitcnt_vscnt null, 0x0
 ; GFX11-NEXT:    buffer_atomic_max_f32 v0, v1, s[0:3], 0 offen
 ; GFX11-NEXT:    s_waitcnt_vscnt null, 0x0
@@ -1717,13 +1697,9 @@ define void @buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_
 ; GFX10-LABEL: buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
 ; GFX10:       ; %bb.0:
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    v_mov_b32_e32 v1, s18
-; GFX10-NEXT:    s_mov_b32 s4, s6
-; GFX10-NEXT:    s_mov_b32 s5, s7
-; GFX10-NEXT:    s_mov_b32 s6, s16
-; GFX10-NEXT:    s_mov_b32 s7, s17
+; GFX10-NEXT:    v_mov_b32_e32 v1, s20
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0
-; GFX10-NEXT:    buffer_atomic_fmax v0, v1, s[4:7], 0 offen
+; GFX10-NEXT:    buffer_atomic_fmax v0, v1, s[16:19], 0 offen
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0
 ; GFX10-NEXT:    buffer_gl1_inv
 ; GFX10-NEXT:    buffer_gl0_inv
@@ -1732,13 +1708,9 @@ define void @buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_
 ; GFX90A-LABEL: buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
 ; GFX90A:       ; %bb.0:
 ; GFX90A-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX90A-NEXT:    s_mov_b32 s4, s6
-; GFX90A-NEXT:    s_mov_b32 s5, s7
-; GFX90A-NEXT:    s_mov_b32 s6, s16
-; GFX90A-NEXT:    s_mov_b32 s7, s17
-; GFX90A-NEXT:    v_mov_b32_e32 v2, s18
-; GFX90A-NEXT:    buffer_load_dword v1, v2, s[4:7], 0 offen
-; GFX90A-NEXT:    s_mov_b64 s[8:9], 0
+; GFX90A-NEXT:    v_mov_b32_e32 v2, s20
+; GFX90A-NEXT:    buffer_load_dword v1, v2, s[16:19], 0 offen
+; GFX90A-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX90A-NEXT:    v_max_f32_e32 v3, v0, v0
 ; GFX90A-NEXT:  .LBB13_1: ; %atomicrmw.start
 ; GFX90A-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -1746,28 +1718,24 @@ define void @buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_
 ; GFX90A-NEXT:    v_max_f32_e32 v0, v1, v1
 ; GFX90A-NEXT:    v_max_f32_e32 v0, v0, v3
 ; GFX90A-NEXT:    v_pk_mov_b32 v[4:5], v[0:1], v[0:1] op_sel:[0,1]
-; GFX90A-NEXT:    buffer_atomic_cmpswap v[4:5], v2, s[4:7], 0 offen glc
+; GFX90A-NEXT:    buffer_atomic_cmpswap v[4:5], v2, s[16:19], 0 offen glc
 ; GFX90A-NEXT:    s_waitcnt vmcnt(0)
 ; GFX90A-NEXT:    buffer_wbinvl1
 ; GFX90A-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v1
-; GFX90A-NEXT:    s_or_b64 s[8:9], vcc, s[8:9]
+; GFX90A-NEXT:    s_or_b64 s[4:5], vcc, s[4:5]
 ; GFX90A-NEXT:    v_mov_b32_e32 v1, v4
-; GFX90A-NEXT:    s_andn2_b64 exec, exec, s[8:9]
+; GFX90A-NEXT:    s_andn2_b64 exec, exec, s[4:5]
 ; GFX90A-NEXT:    s_cbranch_execnz .LBB13_1
 ; GFX90A-NEXT:  ; %bb.2: ; %atomicrmw.end
-; GFX90A-NEXT:    s_or_b64 exec, exec, s[8:9]
+; GFX90A-NEXT:    s_or_b64 exec, exec, s[4:5]
 ; GFX90A-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX908-LABEL: buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
 ; GFX908:       ; %bb.0:
 ; GFX908-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX908-NEXT:    s_mov_b32 s4, s6
-; GFX908-NEXT:    s_mov_b32 s5, s7
-; GFX908-NEXT:    s_mov_b32 s6, s16
-; GFX908-NEXT:    s_mov_b32 s7, s17
-; GFX908-NEXT:    v_mov_b32_e32 v2, s18
-; GFX908-NEXT:    buffer_load_dword v1, v2, s[4:7], 0 offen
-; GFX908-NEXT:    s_mov_b64 s[8:9], 0
+; GFX908-NEXT:    v_mov_b32_e32 v2, s20
+; GFX908-NEXT:    buffer_load_dword v1, v2, s[16:19], 0 offen
+; GFX908-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX908-NEXT:    v_max_f32_e32 v3, v0, v0
 ; GFX908-NEXT:  .LBB13_1: ; %atomicrmw.start
 ; GFX908-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -1776,28 +1744,24 @@ define void @buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_
 ; GFX908-NEXT:    v_max_f32_e32 v0, v0, v3
 ; GFX908-NEXT:    v_mov_b32_e32 v5, v1
 ; GFX908-NEXT:    v_mov_b32_e32 v4, v0
-; GFX908-NEXT:    buffer_atomic_cmpswap v[4:5], v2, s[4:7], 0 offen glc
+; GFX908-NEXT:    buffer_atomic_cmpswap v[4:5], v2, s[16:19], 0 offen glc
 ; GFX908-NEXT:    s_waitcnt vmcnt(0)
 ; GFX908-NEXT:    buffer_wbinvl1
 ; GFX908-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v1
-; GFX908-NEXT:    s_or_b64 s[8:9], vcc, s[8:9]
+; GFX908-NEXT:    s_or_b64 s[4:5], vcc, s[4:5]
 ; GFX908-NEXT:    v_mov_b32_e32 v1, v4
-; GFX908-NEXT:    s_andn2_b64 exec, exec, s[8:9]
+; GFX908-NEXT:    s_andn2_b64 exec, exec, s[4:5]
 ; GFX908-NEXT:    s_cbranch_execnz .LBB13_1
 ; GFX908-NEXT:  ; %bb.2: ; %atomicrmw.end
-; GFX908-NEXT:    s_or_b64 exec, exec, s[8:9]
+; GFX908-NEXT:    s_or_b64 exec, exec, s[4:5]
 ; GFX908-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: buffer_fat_ptr_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; G...
[truncated]

@shiltian shiltian force-pushed the users/shiltian/autogen-andorbitset branch from 700bdb6 to 0876ac6 Compare October 16, 2024 23:13
@shiltian shiltian force-pushed the users/shiltian/queue-ptr-cov5 branch from 8c9fc97 to 958a04e Compare October 16, 2024 23:13
@shiltian shiltian force-pushed the users/shiltian/autogen-andorbitset branch from 0876ac6 to c403ecc Compare October 16, 2024 23:55
@shiltian shiltian force-pushed the users/shiltian/queue-ptr-cov5 branch from 958a04e to 1abf3f2 Compare October 16, 2024 23:55
@shiltian shiltian force-pushed the users/shiltian/autogen-andorbitset branch from c403ecc to 76a1bb2 Compare October 17, 2024 00:06
@shiltian shiltian force-pushed the users/shiltian/queue-ptr-cov5 branch from 1abf3f2 to 128a266 Compare October 17, 2024 00:06
@shiltian shiltian force-pushed the users/shiltian/autogen-andorbitset branch from 76a1bb2 to 4bd5d5d Compare October 17, 2024 00:09
@shiltian shiltian force-pushed the users/shiltian/queue-ptr-cov5 branch from 128a266 to c9ff8ab Compare October 17, 2024 00:09
@shiltian shiltian marked this pull request as ready for review October 17, 2024 00:18
@shiltian shiltian requested a review from jayfoad October 17, 2024 00:18
@shiltian
Copy link
Contributor Author

The only test failure is CodeGen/AMDGPU/call-args-inreg.ll. It crashes when lowering function test_call_external_void_func_a15i32_inreg.

report_fatal_error("failed to find free scratch register");

@arsenm
Copy link
Contributor

arsenm commented Oct 17, 2024

The only test failure is CodeGen/AMDGPU/call-args-inreg.ll. It crashes when lowering function test_call_external_void_func_a15i32_inreg.

report_fatal_error("failed to find free scratch register");

Something is wrong with how inreg is interacting with special arguments at callsites. The special arguments shouldn't be changing the number of registers available for user arguments.

Base automatically changed from users/shiltian/autogen-andorbitset to main October 17, 2024 14:55
@shiltian
Copy link
Contributor Author

shiltian commented Oct 18, 2024

The special arguments shouldn't be changing the number of registers available for user arguments.

It is actually not for user arguments, exec instead, when emitting function prologue. All non-callee save SGPRs are already allocated at this moment.

The fix might be to spill one SGPR or reserve one ahead of time when allocating SGPRs for arguments.

@arsenm
Copy link
Contributor

arsenm commented Oct 19, 2024

The fix might be to spill one SGPR or reserve one ahead of time when allocating SGPRs for arguments.

We probably should just reserved a register always for these spill situations. The SGPR argument part is incidental

@shiltian shiltian force-pushed the users/shiltian/queue-ptr-cov5 branch from 41ec3fb to 7abe2a6 Compare November 8, 2024 16:16
shiltian added a commit that referenced this pull request Nov 8, 2024
We’re facing an issue (#113782) that is currently blocking #112403. However,
since #112403 involves extensive test changes, I’d prefer to land it as soon as
possible. This PR reorganizes the tests by moving test cases expected to fail
into a separate file. Additionally, it changes the `[15 x i32]` arguments to
`[13 x i32]` to bypass the issue.
@shiltian shiltian force-pushed the users/shiltian/disable-test-no-sgpr-for-csrspill branch from 9b97b93 to b289eef Compare November 8, 2024 17:57
shiltian added a commit that referenced this pull request Nov 8, 2024
We’re facing an issue (#113782) that is currently blocking #112403. However,
since #112403 involves extensive test changes, I’d prefer to land it as soon as
possible. This PR reorganizes the tests by moving test cases expected to fail
into a separate file. Additionally, it changes the `[15 x i32]` arguments to
`[13 x i32]` to bypass the issue.
Base automatically changed from users/shiltian/disable-test-no-sgpr-for-csrspill to main November 8, 2024 18:00
@shiltian shiltian force-pushed the users/shiltian/queue-ptr-cov5 branch from 7abe2a6 to d972769 Compare November 8, 2024 18:02
Copy link
Contributor Author

shiltian commented Nov 8, 2024

Merge activity

  • Nov 8, 1:03 PM EST: A user started a stack merge that includes this pull request via Graphite.
  • Nov 8, 1:05 PM EST: A user merged this pull request with Graphite.

@shiltian shiltian merged commit e215a1e into main Nov 8, 2024
5 of 6 checks passed
@shiltian shiltian deleted the users/shiltian/queue-ptr-cov5 branch November 8, 2024 18:05
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 8, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/9877

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: api/ompx_sync.c' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/api/ompx_sync.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/ompx_sync.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/ompx_sync.c.tmp | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/api/ompx_sync.c
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/api/ompx_sync.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/ompx_sync.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/ompx_sync.c.tmp
# .---command stderr------------
# | Display only launched kernel:
# | Kernel 'omp target in foo @ 10 (__omp_offloading_802_b38825e_foo_l10)'
# | OFFLOAD ERROR: Memory access fault by GPU 1 (agent 0x559a726e41f0) at virtual address (nil). Reasons: Page not present or supervisor privilege
# | Use 'OFFLOAD_TRACK_ALLOCATION_TRACES=true' to track device allocations
# `-----------------------------
# error: command failed with exit status: -6
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/api/ompx_sync.c
# .---command stderr------------
# | FileCheck error: '<stdin>' is empty.
# | FileCheck command line:  /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/api/ompx_sync.c
# `-----------------------------
# error: command failed with exit status: 2

--

********************


@jplehr
Copy link
Contributor

jplehr commented Nov 8, 2024

Heu @shiltian I think this patch also broke the HIP bot: https://lab.llvm.org/buildbot/#/builders/123/builds/9031 and other OpenMP Offload bots.
Can you please take a look?

@arsenm
Copy link
Contributor

arsenm commented Nov 8, 2024

https://lab.llvm.org/buildbot/#/builders/123/builds/9031

That's one of the flaky tests that randomly fails

@jplehr
Copy link
Contributor

jplehr commented Nov 8, 2024

https://lab.llvm.org/buildbot/#/builders/123/builds/9031

That's one of the flaky tests that randomly fails

I thought the one that was failing randomly for some time now has been disabled and the HIP bot is still red.

All OpenMP bots are also red.
https://lab.llvm.org/buildbot/#/builders/30
https://lab.llvm.org/buildbot/#/builders/73
https://lab.llvm.org/staging/#/builders/105
https://lab.llvm.org/staging/#/builders/97
https://lab.llvm.org/staging/#/builders/130

@shiltian
Copy link
Contributor Author

shiltian commented Nov 8, 2024

This change is likely to expose an issue in the register allocation, as we now lose 2 SGPRs by default.

shiltian added a commit that referenced this pull request Nov 8, 2024
…COV5 (#112403)"

This reverts commit e215a1e as it broke both
hip and openmp buildbots.
shiltian added a commit that referenced this pull request Nov 9, 2024
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Nov 9, 2024
… COV5 (llvm#112403)"

This reverts commit ca33649.

Change-Id: Icb47ca972ee762362bb4bc0d1c04e2592e03932f
@kosarev
Copy link
Collaborator

kosarev commented Nov 13, 2024

@shiltian This seems to unintentionally add the binary file llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.o to tests.

@shiltian
Copy link
Contributor Author

@shiltian This seems to unintentionally add the binary file llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.o to tests.

Yes, indeed. Thanks! I fixed it downstream and I thought it was added there.

Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
We’re facing an issue (llvm#113782) that is currently blocking llvm#112403. However,
since llvm#112403 involves extensive test changes, I’d prefer to land it as soon as
possible. This PR reorganizes the tests by moving test cases expected to fail
into a separate file. Additionally, it changes the `[15 x i32]` arguments to
`[13 x i32]` to bypass the issue.
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
…COV5 (llvm#112403)"

This reverts commit e215a1e as it broke both
hip and openmp buildbots.
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
nico pushed a commit that referenced this pull request Nov 15, 2024
Mistimed rebase for #112251 which added new tests which did not consider
the changes introduced in #112403 yet
abidh pushed a commit to abidh/llvm-project that referenced this pull request Feb 4, 2025
…ass (llvm#102913)

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction
pass. Moves function resource info propagation to to MC layer (through
helpers in AMDGPUMCResourceInfo) by generating MCExprs for every
function resource which the emitters have been prepped for.

Fixes llvm#64863

[AMDGPU] Fix stack size metadata for functions with direct and indirect calls (llvm#110828)

When a function has an external call, it should still use the stack
sizes of direct, known, calls to calculate its own stack size

[AMDGPU] Fix resource usage information for unnamed functions (llvm#115320)

Resource usage information would try to overwrite unnamed functions if
there are multiple within the same compilation unit. This aims to either
use the `MCSymbol` assigned to the unnamed function (i.e.,
`CurrentFnSym`), or, rematerialize the `MCSymbol` for the unnamed
function.

Reapply [AMDGPU] Avoid resource propagation for recursion through multiple functions (llvm#112251)

I was wrong last patch. I viewed the `Visited` set purely as a possible
recursion deterrent where functions calling a callee multiple times are
handled elsewhere. This doesn't consider cases where a function is
called multiple times by different callers still part of the same call
graph. New test shows the aforementioned case.

Reapplies llvm#111004, fixes llvm#115562.

[AMDGPU] Newly added test modified for recent SGPR use change (llvm#116427)

Mistimed rebase for llvm#112251 which added new tests which did not consider
the changes introduced in llvm#112403 yet

Change-Id: I4dfe6a1b679137e080a6d2b44016347ea704b014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants