Skip to content

AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. #95592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Jun 14, 2024

Define subtarget features for atomic fmin/fmax support.

The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.

gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.

Copy link
Contributor Author

arsenm commented Jun 14, 2024

@llvmbot
Copy link
Member

llvmbot commented Jun 14, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Define subtarget features for atomic fmin/fmax support.

The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.

gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.


Patch is 1.39 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95592.diff

21 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+63-8)
  • (modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+27-12)
  • (modified) llvm/lib/Target/AMDGPU/FLATInstructions.td (+55-30)
  • (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+20)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+22)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll (+155-1654)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll (+155-1654)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmax.ll (+181-2141)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmin.ll (+181-2141)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+490-1737)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+490-1737)
  • (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fmax.f32.ll (+638)
  • (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fmax.f64.ll (+271)
  • (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fmin.f32.ll (+638)
  • (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fmin.f64.ll (+271)
  • (modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f32-agent.ll (+1584-256)
  • (modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f32-system.ll (+1584-256)
  • (modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f64-agent.ll (+792-128)
  • (modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f64-system.ll (+792-128)
  • (modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmax.ll (+97-77)
  • (modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmin.ll (+97-77)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 0a1550ccb53c4..2f4ca847096a1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -351,6 +351,7 @@ def FeatureGFX90AInsts : SubtargetFeature<"gfx90a-insts",
   "GFX90AInsts",
   "true",
   "Additional instructions for GFX90A+"
+  // [HasAtomicFMinFMaxF64GlobalInsts, HasAtomicFMinFMaxF64FlatInsts] // TODO
 >;
 
 def FeatureGFX940Insts : SubtargetFeature<"gfx940-insts",
@@ -711,6 +712,30 @@ def FeatureAtomicFaddRtnInsts : SubtargetFeature<"atomic-fadd-rtn-insts",
   [FeatureFlatGlobalInsts]
 >;
 
+def FeatureAtomicFMinFMaxF32GlobalInsts : SubtargetFeature<"atomic-fmin-fmax-global-f32",
+  "HasAtomicFMinFMaxF32GlobalInsts",
+  "true",
+  "Has global/buffer instructions for atomicrmw fmin/fmax for float"
+>;
+
+def FeatureAtomicFMinFMaxF64GlobalInsts : SubtargetFeature<"atomic-fmin-fmax-global-f64",
+  "HasAtomicFMinFMaxF64GlobalInsts",
+  "true",
+  "Has global/buffer instructions for atomicrmw fmin/fmax for float"
+>;
+
+def FeatureAtomicFMinFMaxF32FlatInsts : SubtargetFeature<"atomic-fmin-fmax-flat-f32",
+  "HasAtomicFMinFMaxF32FlatInsts",
+  "true",
+  "Has flat memory instructions for atomicrmw fmin/fmax for float"
+>;
+
+def FeatureAtomicFMinFMaxF64FlatInsts : SubtargetFeature<"atomic-fmin-fmax-flat-f64",
+  "HasAtomicFMinFMaxF64FlatInsts",
+  "true",
+  "Has flat memory instructions for atomicrmw fmin/fmax for double"
+>;
+
 def FeatureAtomicFaddNoRtnInsts : SubtargetFeature<"atomic-fadd-no-rtn-insts",
   "HasAtomicFaddNoRtnInsts",
   "true",
@@ -1061,7 +1086,8 @@ def FeatureSouthernIslands : GCNSubtargetFeatureGeneration<"SOUTHERN_ISLANDS",
   FeatureWavefrontSize64, FeatureSMemTimeInst, FeatureMadMacF32Insts,
   FeatureDsSrc2Insts, FeatureLDSBankCount32, FeatureMovrel,
   FeatureTrigReducedRange, FeatureExtendedImageInsts, FeatureImageInsts,
-  FeatureGDS, FeatureGWS, FeatureDefaultComponentZero
+  FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
+  FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts
   ]
 >;
 
@@ -1072,7 +1098,9 @@ def FeatureSeaIslands : GCNSubtargetFeatureGeneration<"SEA_ISLANDS",
   FeatureCIInsts, FeatureMovrel, FeatureTrigReducedRange,
   FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts,
   FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureUnalignedBufferAccess,
-  FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero
+  FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
+  FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
+  FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts
   ]
 >;
 
@@ -1127,7 +1155,9 @@ def FeatureGFX10 : GCNSubtargetFeatureGeneration<"GFX10",
    FeatureA16, FeatureSMemTimeInst, FeatureFastDenormalF32, FeatureG16,
    FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess, FeatureImageInsts,
    FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
-   FeatureMaxHardClauseLength63
+   FeatureMaxHardClauseLength63,
+   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
+   FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts
   ]
 >;
 
@@ -1148,7 +1178,8 @@ def FeatureGFX11 : GCNSubtargetFeatureGeneration<"GFX11",
    FeatureA16, FeatureFastDenormalF32, FeatureG16,
    FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess, FeatureGDS,
    FeatureGWS, FeatureDefaultComponentZero,
-   FeatureMaxHardClauseLength32
+   FeatureMaxHardClauseLength32,
+   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts
   ]
 >;
 
@@ -1169,7 +1200,8 @@ def FeatureGFX12 : GCNSubtargetFeatureGeneration<"GFX12",
    FeatureA16, FeatureFastDenormalF32, FeatureG16,
    FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess,
    FeatureTrue16BitInsts, FeatureDefaultComponentBroadcast,
-   FeatureMaxHardClauseLength32
+   FeatureMaxHardClauseLength32,
+   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts
   ]
 >;
 
@@ -1332,7 +1364,10 @@ def FeatureISAVersion9_0_A : FeatureSet<
      FeaturePackedTID,
      FullRate64Ops,
      FeatureBackOffBarrier,
-     FeatureKernargPreload])>;
+     FeatureKernargPreload,
+     FeatureAtomicFMinFMaxF64GlobalInsts,
+     FeatureAtomicFMinFMaxF64FlatInsts
+     ])>;
 
 def FeatureISAVersion9_0_C : FeatureSet<
   !listconcat(FeatureISAVersion9_0_Consumer_Common.Features,
@@ -1372,7 +1407,10 @@ def FeatureISAVersion9_4_Common : FeatureSet<
    FeatureArchitectedFlatScratch,
    FullRate64Ops,
    FeatureBackOffBarrier,
-   FeatureKernargPreload]>;
+   FeatureKernargPreload,
+   FeatureAtomicFMinFMaxF64GlobalInsts,
+   FeatureAtomicFMinFMaxF64FlatInsts
+   ]>;
 
 def FeatureISAVersion9_4_0 : FeatureSet<
   !listconcat(FeatureISAVersion9_4_Common.Features,
@@ -1862,11 +1900,28 @@ def isGFX12Plus :
 def HasFlatAddressSpace : Predicate<"Subtarget->hasFlatAddressSpace()">,
   AssemblerPredicate<(all_of FeatureFlatAddressSpace)>;
 
-def HasBufferFlatGlobalAtomicsF64 :
+
+def HasBufferFlatGlobalAtomicsF64 : // FIXME: Rename to show it's only for fadd
   Predicate<"Subtarget->hasBufferFlatGlobalAtomicsF64()">,
   // FIXME: This is too coarse, and working around using pseudo's predicates on real instruction.
   AssemblerPredicate<(any_of FeatureGFX90AInsts, FeatureGFX10Insts, FeatureSouthernIslands, FeatureSeaIslands)>;
 
+def HasAtomicFMinFMaxF32GlobalInsts :
+  Predicate<"Subtarget->hasAtomicFMinFMaxF32GlobalInsts()">,
+  AssemblerPredicate<(any_of FeatureAtomicFMinFMaxF32GlobalInsts)>;
+
+def HasAtomicFMinFMaxF64GlobalInsts :
+  Predicate<"Subtarget->hasAtomicFMinFMaxF64GlobalInsts()">,
+  AssemblerPredicate<(any_of FeatureAtomicFMinFMaxF64GlobalInsts)>;
+
+def HasAtomicFMinFMaxF32FlatInsts :
+  Predicate<"Subtarget->hasAtomicFMinFMaxF32FlatInsts()">,
+  AssemblerPredicate<(any_of FeatureAtomicFMinFMaxF32FlatInsts)>;
+
+def HasAtomicFMinFMaxF64FlatInsts :
+  Predicate<"Subtarget->hasAtomicFMinFMaxF64FlatInsts()">,
+  AssemblerPredicate<(any_of FeatureAtomicFMinFMaxF64FlatInsts)>;
+
 def HasLdsAtomicAddF64 :
   Predicate<"Subtarget->hasLdsAtomicAddF64()">,
   AssemblerPredicate<(any_of FeatureGFX90AInsts)>;
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 9d21f93a957cc..7a2ac76854efb 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1149,21 +1149,21 @@ let SubtargetPredicate = isGFX6GFX7GFX10Plus in {
 defm BUFFER_ATOMIC_FCMPSWAP : MUBUF_Pseudo_Atomics <
   "buffer_atomic_fcmpswap", VReg_64, v2f32, null_frag
 >;
+}
+
+let SubtargetPredicate = HasAtomicFMinFMaxF32GlobalInsts in {
 defm BUFFER_ATOMIC_FMIN : MUBUF_Pseudo_Atomics <
   "buffer_atomic_fmin", VGPR_32, f32, null_frag
 >;
 defm BUFFER_ATOMIC_FMAX : MUBUF_Pseudo_Atomics <
   "buffer_atomic_fmax", VGPR_32, f32, null_frag
 >;
-
 }
 
 let SubtargetPredicate = isGFX6GFX7GFX10 in {
-
 defm BUFFER_ATOMIC_FCMPSWAP_X2 : MUBUF_Pseudo_Atomics <
   "buffer_atomic_fcmpswap_x2", VReg_128, v2f64, null_frag
 >;
-
 }
 
 let SubtargetPredicate = HasD16LoadStore in {
@@ -1657,6 +1657,16 @@ defm : BufferAtomicPat<"atomic_load_udec_wrap_global", Ty, "BUFFER_ATOMIC_DEC" #
 
 } // end foreach Ty
 
+let SubtargetPredicate = HasAtomicFMinFMaxF32GlobalInsts in {
+defm : BufferAtomicPat<"atomic_load_fmin_global", f32, "BUFFER_ATOMIC_FMIN">;
+defm : BufferAtomicPat<"atomic_load_fmax_global", f32, "BUFFER_ATOMIC_FMAX">;
+}
+
+let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in {
+defm : BufferAtomicPat<"atomic_load_fmin_global", f64, "BUFFER_ATOMIC_MIN_F64">;
+defm : BufferAtomicPat<"atomic_load_fmax_global", f64, "BUFFER_ATOMIC_MAX_F64">;
+}
+
 defm : BufferAtomicCmpSwapPat<i32, v2i32, "BUFFER_ATOMIC_CMPSWAP">;
 defm : BufferAtomicCmpSwapPat<i64, v2i64, "BUFFER_ATOMIC_CMPSWAP_X2">;
 
@@ -1711,7 +1721,7 @@ multiclass SIBufferAtomicPat_Common<string OpPrefix, ValueType vt, string Inst,
 
 multiclass SIBufferAtomicPat<string OpPrefix, ValueType vt, string Inst,
                              list<string> RtnModes = ["ret", "noret"]> {
-  let SubtargetPredicate = HasUnrestrictedSOffset in {
+  let OtherPredicates = [HasUnrestrictedSOffset] in {
     defm : SIBufferAtomicPat_Common<OpPrefix, vt, Inst, RtnModes>;
   }
   defm : SIBufferAtomicPat_Common<OpPrefix, vt, Inst # "_VBUFFER", RtnModes>;
@@ -1744,22 +1754,24 @@ defm : SIBufferAtomicPat<"SIbuffer_atomic_xor", i64, "BUFFER_ATOMIC_XOR_X2">;
 defm : SIBufferAtomicPat<"SIbuffer_atomic_inc", i64, "BUFFER_ATOMIC_INC_X2">;
 defm : SIBufferAtomicPat<"SIbuffer_atomic_dec", i64, "BUFFER_ATOMIC_DEC_X2">;
 
-let OtherPredicates = [HasAtomicCSubNoRtnInsts] in
+let SubtargetPredicate = HasAtomicCSubNoRtnInsts in
 defm : SIBufferAtomicPat<"SIbuffer_atomic_csub", i32, "BUFFER_ATOMIC_CSUB", ["noret"]>;
 
 let SubtargetPredicate = isGFX12Plus in {
   defm : SIBufferAtomicPat_Common<"SIbuffer_atomic_fadd_bf16", v2bf16, "BUFFER_ATOMIC_PK_ADD_BF16_VBUFFER">;
   defm : SIBufferAtomicPat_Common<"SIbuffer_atomic_cond_sub_u32", i32, "BUFFER_ATOMIC_COND_SUB_U32_VBUFFER", ["ret"]>;
+}
 
-  let OtherPredicates = [HasAtomicCSubNoRtnInsts] in
+let SubtargetPredicate = HasAtomicCSubNoRtnInsts in {
   defm : SIBufferAtomicPat_Common<"SIbuffer_atomic_cond_sub_u32", i32, "BUFFER_ATOMIC_COND_SUB_U32_VBUFFER", ["noret"]>;
 }
 
-let OtherPredicates = [isGFX6GFX7GFX10Plus] in {
+let SubtargetPredicate = HasAtomicFMinFMaxF32GlobalInsts in {
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f32, "BUFFER_ATOMIC_FMIN">;
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fmax", f32, "BUFFER_ATOMIC_FMAX">;
 }
-let SubtargetPredicate = isGFX6GFX7GFX10 in {
+
+let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in {
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f64, "BUFFER_ATOMIC_MIN_F64">;
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fmax", f64, "BUFFER_ATOMIC_MAX_F64">;
 }
@@ -1815,7 +1827,7 @@ multiclass BufferAtomicPatterns_NO_RTN<SDPatternOperator name, ValueType vt,
   defm : BufferAtomicPatterns_NO_RTN_Common<name, vt, opcode # "_VBUFFER">;
 }
 
-let OtherPredicates = [HasAtomicFaddNoRtnInsts] in
+let SubtargetPredicate = HasAtomicFaddNoRtnInsts in
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", f32, "BUFFER_ATOMIC_ADD_F32", ["noret"]>;
 
 let OtherPredicates = [HasAtomicBufferGlobalPkAddF16NoRtnInsts] in {
@@ -1826,7 +1838,7 @@ let OtherPredicates = [HasAtomicBufferGlobalPkAddF16NoRtnInsts] in {
   defm : SIBufferAtomicPat_Common<"SIbuffer_atomic_fadd", v2f16, "BUFFER_ATOMIC_PK_ADD_F16_VBUFFER", ["noret"]>;
 } // End OtherPredicates = [HasAtomicBufferGlobalPkAddF16NoRtnInsts]
 
-let OtherPredicates = [HasAtomicFaddRtnInsts] in
+let SubtargetPredicate = HasAtomicFaddRtnInsts in
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", f32, "BUFFER_ATOMIC_ADD_F32", ["ret"]>;
 
 let OtherPredicates = [HasAtomicBufferGlobalPkAddF16Insts] in {
@@ -1837,11 +1849,14 @@ let OtherPredicates = [HasAtomicBufferGlobalPkAddF16Insts] in {
   defm : SIBufferAtomicPat_Common<"SIbuffer_atomic_fadd", v2f16, "BUFFER_ATOMIC_PK_ADD_F16_VBUFFER", ["ret"]>;
 } // End OtherPredicates = [HasAtomicBufferGlobalPkAddF16Insts]
 
-let OtherPredicates = [HasBufferFlatGlobalAtomicsF64] in {
+let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in {
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", f64, "BUFFER_ATOMIC_ADD_F64">;
+} // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64
+
+let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in {
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f64, "BUFFER_ATOMIC_MIN_F64">;
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fmax", f64, "BUFFER_ATOMIC_MAX_F64">;
-} // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64
+} //End let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts
 
 multiclass SIBufferAtomicCmpSwapPat_Common<ValueType vt, ValueType data_vt, string Inst> {
   foreach RtnMode = ["ret", "noret"] in {
diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td b/llvm/lib/Target/AMDGPU/FLATInstructions.td
index a1388b41db428..2824d6fdafdb3 100644
--- a/llvm/lib/Target/AMDGPU/FLATInstructions.td
+++ b/llvm/lib/Target/AMDGPU/FLATInstructions.td
@@ -752,19 +752,29 @@ defm FLAT_ATOMIC_DEC_X2     : FLAT_Atomic_Pseudo <"flat_atomic_dec_x2",
 
 // GFX7-, GFX10-only flat instructions.
 let SubtargetPredicate = isGFX7GFX10 in {
-
 defm FLAT_ATOMIC_FCMPSWAP_X2 : FLAT_Atomic_Pseudo <"flat_atomic_fcmpswap_x2",
                                 VReg_64, f64, v2f64, VReg_128>;
-
 } // End SubtargetPredicate = isGFX7GFX10
 
+
+// The names may be flat_atomic_fmin_x2 on some subtargets, but we
+// choose this as the canonical name.
+let SubtargetPredicate = HasAtomicFMinFMaxF64FlatInsts in {
+defm FLAT_ATOMIC_MIN_F64 : FLAT_Atomic_Pseudo <"flat_atomic_min_f64",
+                                               VReg_64, f64>;
+
+defm FLAT_ATOMIC_MAX_F64 : FLAT_Atomic_Pseudo <"flat_atomic_max_f64",
+                                                VReg_64, f64>;
+}
+
+let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in {
+defm GLOBAL_ATOMIC_MIN_F64 : FLAT_Global_Atomic_Pseudo<"global_atomic_min_f64", VReg_64, f64>;
+defm GLOBAL_ATOMIC_MAX_F64 : FLAT_Global_Atomic_Pseudo<"global_atomic_max_f64", VReg_64, f64>;
+}
+
 let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in {
   defm FLAT_ATOMIC_ADD_F64   : FLAT_Atomic_Pseudo<"flat_atomic_add_f64", VReg_64, f64>;
-  defm FLAT_ATOMIC_MIN_F64   : FLAT_Atomic_Pseudo<"flat_atomic_min_f64", VReg_64, f64>;
-  defm FLAT_ATOMIC_MAX_F64   : FLAT_Atomic_Pseudo<"flat_atomic_max_f64", VReg_64, f64>;
   defm GLOBAL_ATOMIC_ADD_F64 : FLAT_Global_Atomic_Pseudo<"global_atomic_add_f64", VReg_64, f64>;
-  defm GLOBAL_ATOMIC_MIN_F64 : FLAT_Global_Atomic_Pseudo<"global_atomic_min_f64", VReg_64, f64>;
-  defm GLOBAL_ATOMIC_MAX_F64 : FLAT_Global_Atomic_Pseudo<"global_atomic_max_f64", VReg_64, f64>;
 } // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64
 
 let SubtargetPredicate = HasAtomicFlatPkAdd16Insts in {
@@ -1421,6 +1431,17 @@ defm : FlatAtomicPat <"FLAT_ATOMIC_OR_X2", "atomic_load_or_"#as, i64>;
 defm : FlatAtomicPat <"FLAT_ATOMIC_SWAP_X2", "atomic_swap_"#as, i64>;
 defm : FlatAtomicPat <"FLAT_ATOMIC_CMPSWAP_X2", "AMDGPUatomic_cmp_swap_"#as, i64, v2i64>;
 defm : FlatAtomicPat <"FLAT_ATOMIC_XOR_X2", "atomic_load_xor_"#as, i64>;
+
+let SubtargetPredicate = HasAtomicFMinFMaxF32FlatInsts in {
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMIN", "atomic_load_fmin_"#as, f32>;
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMAX", "atomic_load_fmax_"#as, f32>;
+}
+
+let SubtargetPredicate = HasAtomicFMinFMaxF64FlatInsts in {
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MIN_F64", "atomic_load_fmin_"#as, f64>;
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MAX_F64", "atomic_load_fmax_"#as, f64>;
+}
+
 } // end foreach as
 
 let SubtargetPredicate = isGFX12Plus in {
@@ -1582,33 +1603,33 @@ let OtherPredicates = [isGFX12Plus] in {
   }
 }
 
-let OtherPredicates = [isGFX10Plus] in {
+let SubtargetPredicate = HasAtomicFMinFMaxF32GlobalInsts, OtherPredicates = [HasFlatGlobalInsts] in {
 defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_FMIN", "atomic_load_fmin_global", f32>;
 defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_FMAX", "atomic_load_fmax_global", f32>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMIN", "atomic_load_fmin_flat", f32>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMAX", "atomic_load_fmax_flat", f32>;
-}
-
-let OtherPredicates = [isGFX10GFX11] in {
 defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMIN", "int_amdgcn_global_atomic_fmin", f32>;
 defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMAX", "int_amdgcn_global_atomic_fmax", f32>;
+}
 
+let SubtargetPredicate = HasAtomicFMinFMaxF32FlatInsts in {
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMIN", "atomic_load_fmin_flat", f32>;
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMAX", "atomic_load_fmax_flat", f32>;
 defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMIN", "int_amdgcn_flat_atomic_fmin", f32>;
 defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMAX", "int_amdgcn_flat_atomic_fmax", f32>;
 }
 
-let OtherPredicates = [isGFX10Only] in {
-defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MIN_F64", "atomic_load_fmin_global", f64>;
-defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MAX_F64", "atomic_load_fmax_global", f64>;
-defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_MIN_F64", "int_amdgcn_global_atomic_fmin", f64>;
-defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_MAX_F64", "int_amdgcn_global_atomic_fmax", f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MIN_F64", "atomic_load_fmin_flat", f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MAX_F64", "atomic_load_fmax_flat", f64>;
-defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_MIN_F64", "int_amdgcn_flat_atomic_fmin", f64>;
-defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_MAX_F64", "int_amdgcn_flat_atomic_fmax", f64>;
-}
+// let OtherPredicates = [isGFX10Only] in { // fixme
+// defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MIN_F64", "atomic_load_fmin_global", f64>;
+// defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MAX_F64", "atomic_load_fmax_global", f64>;
+// defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_MIN_F64", "int_amdgcn_global_atomic_fmin", f64>;
+// defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_MAX_F64", "int_amdgcn_global_atomic_fmax", f64>;
+// defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MIN_F64", "atomic_load_fmin_flat", f64>;
+// defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MAX_F64", "atomic_load_fmax_flat", f64>;
+// defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_MIN_F64", "int_amdgcn_flat_atomic_fmin", f64>;
+// defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_MAX_F64", "int_amdgcn_flat_atomic_fmax", f64>;
+// }
 
 let OtherPredicates = [isGFX12Only] in {
+  // FIXME: Remove these intrinsics
   defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMIN", "int_amdgcn_global_atomic_fmin_num", f32>;
   defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMAX", "int_amdgcn_global_atomic_fmax_num", f32>;
   defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMIN", "int_amdgcn_flat_atomic_fmin_num", f32>;
@@ -1637,22 +1658,26 @@ defm : GlobalFLATAtomicPatsRtnWithAddrSpace <"GLOBAL_ATOMIC_PK_ADD_F16", "int_am
 defm : GlobalFLATAtomicPatsRtnWithAddrSpace <"GLOBAL_ATOMIC_PK_ADD_F16", "int_amdgcn_global_atomic_fadd", "global_addrspace", v2f16>;
 }
 
-let OtherPredicates = [HasBufferFlatGlobalAtomicsF64] in {
-defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_ADD_F64", "atomic_load_fadd_global", f64>;
+let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts, OtherPredicates = [HasFlatGlobalInsts] in {
 defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MIN_F64", "atomic_load_fmin_global", f64>;
 defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MAX_F64", "atomic_load_fmax_global", f64>;
-defm : GlobalFLATAtomicPatsWithAddrSpace<"GLOBAL_ATOMIC_ADD_F64", "int_amdgcn_flat_atomic_fadd", "global_addrspace", f64>;
-defm : GlobalFLATAtomicPatsWithAddrSpace<"GLOBAL_ATOMIC_ADD_F64", "int_amdgcn_global_atomic_fadd", "global_addrspace", f64>;
 defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_MIN_F64", "int_amdgcn_global_atomic_fmin", f64>;
 defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_MAX_F64", "int_amdgcn_global_atomic_fmax", f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_ADD_F64", "atomic_load_fadd_flat", f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MIN_F64", "atomic_load_fmin_flat", f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MAX_F64", "atomic_load_fmax_flat", f64>;
-defm : FlatSignedAtomicPatWithAddrSpace <"FLAT_ATOMIC_ADD_F64", "int_amdgcn_flat_atomic_fadd", "flat_addrspace", f64>;
+}
+
+let SubtargetPredicate = HasAtomicFMinFMaxF64FlatInsts in {
 defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_MIN_F64", "int_amdgcn_flat_atomic_fmin", f64>;
 defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_MAX_F64", "int_amdgcn_flat_atomic_fmax", f64>;
 }
 
+let OtherPredicates = [HasBufferFlatGlobalAtomicsF64] in {
+defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_ADD_F64", "atomic_load_fadd_global", f64>;
+defm : GlobalFLATAtomicPatsWithAddrSpace<"GLOBAL_ATOMIC_ADD_F64", "int_amdgcn_flat_atomic_fadd", "global_addrspace", f64>;
+defm : GlobalFLATAtomicPatsWithAddrSpace<"GLOBAL_ATOMIC_ADD_F64", "int_amdgcn_global_atomic_fadd", "global_addrspace", f64>;
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_ADD_F64", "atomic_load_fadd_flat", f64>;
+defm : FlatSignedAtomicPatWithAddrSpace <"FLAT_ATOMIC_ADD_F64", "int_amdgcn_flat_atomic_fadd", "flat_addrspace", f64>;
+}
+
 let OtherPredicates = [HasFlatAtomicFaddF32Inst] in {
 defm : FlatSignedAtomicPat <"FLAT_ATOMIC_ADD_F32", "atomic_load_fadd_flat", f32>;
 defm : ...
[truncated]

@arsenm arsenm marked this pull request as ready for review June 14, 2024 19:38
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing anything obviously wrong here, but I don't know if I'm the right person to approve this in

@arsenm arsenm force-pushed the users/arsenm/amdgpu-flat-buffer-atomic-fmin-fmax-pseudos branch 2 times, most recently from 71287fe to a5b973d Compare June 17, 2024 17:56
@arsenm arsenm force-pushed the users/arsenm/amdgpu-select-flat-global-atomicrmw-fmin-fmax branch from 7396f87 to 5e389ee Compare June 17, 2024 17:56
@arsenm arsenm force-pushed the users/arsenm/amdgpu-flat-buffer-atomic-fmin-fmax-pseudos branch from a5b973d to 0187cd4 Compare June 18, 2024 06:40
@arsenm arsenm force-pushed the users/arsenm/amdgpu-select-flat-global-atomicrmw-fmin-fmax branch from 5e389ee to f661aa1 Compare June 18, 2024 06:40
@arsenm arsenm changed the base branch from users/arsenm/amdgpu-flat-buffer-atomic-fmin-fmax-pseudos to users/arsenm/amdgpu-select-atomicrmw-fmin-fmax-local June 18, 2024 06:40
@arsenm arsenm force-pushed the users/arsenm/amdgpu-select-atomicrmw-fmin-fmax-local branch from 6745a06 to 2cb5d11 Compare June 18, 2024 12:48
@arsenm arsenm force-pushed the users/arsenm/amdgpu-select-flat-global-atomicrmw-fmin-fmax branch from f661aa1 to b85a15a Compare June 18, 2024 12:48
@@ -1699,7 +1709,7 @@ multiclass SIBufferAtomicPat_Common<string OpPrefix, ValueType vt, string Inst,

multiclass SIBufferAtomicPat<string OpPrefix, ValueType vt, string Inst,
list<string> RtnModes = ["ret", "noret"]> {
let SubtargetPredicate = HasUnrestrictedSOffset in {
let OtherPredicates = [HasUnrestrictedSOffset] in {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side question, what is the difference between OtherPredicates and SubtargetPredicate? It looks like you swapped a couple of them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SubtargetPredicate is just one thing, and OtherPredicates is a list? The usage pattern consistency has been declining for a while

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

Base automatically changed from users/arsenm/amdgpu-select-atomicrmw-fmin-fmax-local to main June 18, 2024 16:34
@arsenm arsenm force-pushed the users/arsenm/amdgpu-select-flat-global-atomicrmw-fmin-fmax branch 3 times, most recently from 5853be5 to 58b72f3 Compare June 21, 2024 18:44
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jun 21, 2024
@arsenm arsenm force-pushed the users/arsenm/amdgpu-select-flat-global-atomicrmw-fmin-fmax branch from 58b72f3 to d61d35a Compare June 21, 2024 21:53
Copy link
Contributor Author

arsenm commented Jun 23, 2024

Merge activity

  • Jun 23, 4:06 AM EDT: @arsenm started a stack merge that includes this pull request via Graphite.
  • Jun 23, 4:08 AM EDT: Graphite rebased this pull request as part of a merge.
  • Jun 23, 4:10 AM EDT: @arsenm merged this pull request with Graphite.

arsenm added 3 commits June 23, 2024 08:08
Define subtarget features for atomic fmin/fmax support.

The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.

gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.
@arsenm arsenm force-pushed the users/arsenm/amdgpu-select-flat-global-atomicrmw-fmin-fmax branch from d61d35a to ff38a3c Compare June 23, 2024 08:08
@arsenm arsenm merged commit a440a96 into main Jun 23, 2024
4 of 7 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu-select-flat-global-atomicrmw-fmin-fmax branch June 23, 2024 08:10
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 23, 2024

LLVM Buildbot has detected a new failure on builder clang-s390x-linux-lnt running on systemz-1 while building clang,llvm at step 7 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/136/builds/93

Here is the relevant piece of the build log for the reference:

Step 7 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'libFuzzer-s390x-default-Linux :: fuzzer-timeout.test' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/./bin/clang    -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta   --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/lib/fuzzer  /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/test/fuzzer/TimeoutTest.cpp -o /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest
+ /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/lib/fuzzer /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/test/fuzzer/TimeoutTest.cpp -o /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest
RUN: at line 2: /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/./bin/clang    -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta   --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/lib/fuzzer  /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/test/fuzzer/TimeoutEmptyTest.cpp -o /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutEmptyTest
+ /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/lib/fuzzer /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/test/fuzzer/TimeoutEmptyTest.cpp -o /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutEmptyTest
RUN: at line 3: not  /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest -timeout=1 2>&1 | FileCheck /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/test/fuzzer/fuzzer-timeout.test --check-prefix=TimeoutTest
+ FileCheck /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/test/fuzzer/fuzzer-timeout.test --check-prefix=TimeoutTest
+ not /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest -timeout=1
RUN: at line 12: not  /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest -timeout=1 /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/test/fuzzer/hi.txt 2>&1 | FileCheck /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/test/fuzzer/fuzzer-timeout.test --check-prefix=SingleInputTimeoutTest
+ not /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest -timeout=1 /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/test/fuzzer/hi.txt
+ FileCheck /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/llvm/compiler-rt/test/fuzzer/fuzzer-timeout.test --check-prefix=SingleInputTimeoutTest
RUN: at line 16: /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest -timeout=1 -timeout_exitcode=0
+ /home/uweigand/sandbox/buildbot/clang-s390x-linux-lnt/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest -timeout=1 -timeout_exitcode=0
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2096456004
INFO: Loaded 1 modules   (13 inline 8-bit counters): 13 [0x2aa3dde7e48, 0x2aa3dde7e55), 
INFO: Loaded 1 PC tables (13 PCs): 13 [0x2aa3dde7e58,0x2aa3dde7f28), 
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2	INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 32Mb
#2384	NEW    cov: 3 ft: 3 corp: 2/5b lim: 25 exec/s: 0 rss: 32Mb L: 4/4 MS: 2 ChangeBit-InsertRepeatedBytes-
#2445	REDUCE cov: 3 ft: 3 corp: 2/4b lim: 25 exec/s: 0 rss: 32Mb L: 3/3 MS: 1 EraseBytes-
#2446	REDUCE cov: 3 ft: 3 corp: 2/3b lim: 25 exec/s: 0 rss: 32Mb L: 2/2 MS: 1 EraseBytes-
#2472	REDUCE cov: 4 ft: 4 corp: 3/4b lim: 25 exec/s: 0 rss: 32Mb L: 1/2 MS: 1 EraseBytes-
#8948	REDUCE cov: 5 ft: 5 corp: 4/6b lim: 86 exec/s: 0 rss: 33Mb L: 2/2 MS: 1 ChangeByte-
#9038	NEW    cov: 6 ft: 6 corp: 5/13b lim: 86 exec/s: 0 rss: 33Mb L: 7/7 MS: 5 CMP-CrossOver-CrossOver-CrossOver-EraseBytes- DE: "\377\377\377\377"-
#9239	REDUCE cov: 6 ft: 6 corp: 5/12b lim: 86 exec/s: 0 rss: 33Mb L: 6/6 MS: 1 EraseBytes-
#9315	REDUCE cov: 6 ft: 6 corp: 5/9b lim: 86 exec/s: 0 rss: 33Mb L: 3/3 MS: 1 EraseBytes-
ALARM: working on the last Unit for 1 seconds
       and the timeout value is 1 (use -timeout=N to change)
MS: 2 CopyPart-ChangeByte-; base unit: 94dd9e08c129c785f7f256e82fbe0a30e6d1ae40
0x48,0x69,0x21,0x69,
Hi!i
artifact_prefix='./'; Test unit written to ./timeout-c07078879f59203eeb77b1e2390b60cde5634ce6
Base64: SGkhaQ==
==3356045== ERROR: libFuzzer: timeout after 1 seconds
AddressSanitizer:DEADLYSIGNAL
=================================================================
AddressSanitizer:DEADLYSIGNAL
=================================================================
AddressSanitizer: CHECK failed: asan_report.cpp:199 "((current_error_.kind)) == ((kErrorKindInvalid))" (0x1, 0x0) (tid=3356045)
    <empty stack>

MS: 2 CopyPart-ChangeByte-; base unit: 94dd9e08c129c785f7f256e82fbe0a30e6d1ae40
0x48,0x69,0x21,0x69,
Hi!i
...

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
Define subtarget features for atomic fmin/fmax support.

The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.

gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang Clang issues not falling into any other category llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants