Skip to content

[DirectX] Select of two different RawBuffer can not easily be reduced to a callInst. #152348

@farzonl

Description

@farzonl

This bug is very similar to #140819
However instead of an alloca store like pattern we have two different RawBuffer determined via a select.

Possible Solution

What we might want to do here is replace the select with a branch and then create two calls for each RawBuffer

Assert location

 frame #4: 0x00000001003b7430 clang-dxc`(anonymous namespace)::OpLowerer::lowerIntrinsics() [inlined] decltype(auto) llvm::cast<llvm::CallInst, llvm::Value>(Val=0x000060f00000a3c0) at Casting.h:578:3 [opt]
   575 
   576  template <typename To, typename From>
   577  [[nodiscard]] inline decltype(auto) cast(From *Val) {
-> 578    assert(isa<To>(Val) && "cast<Ty>() argument of incompatible type!");
   579    return CastInfo<To, From *>::doCast(Val);
   580  }
   581 

Like before the issue is we were expecting a CallInst

frame #5: 0x00000001003b73e8 clang-dxc`(anonymous namespace)::OpLowerer::lowerIntrinsics() at DXILOpLowering.cpp:177:23 [opt]
   174        }
   175        // Otherwise, we're the second handle in a pair. Forward the arguments and
   176        // remove the (second) cast.
-> 177        CallInst *Def = cast<CallInst>(Cast->getOperand(0));
   178        assert(Def->getIntrinsicID() == Intrinsic::dx_resource_casthandle &&
   179               "Unbalanced pair of temporary handle casts");
   180        Cast->replaceAllUsesWith(Def->getOperand(0));

Crashing instruction:

%. = select i1 %cmp14.not.i, target("dx.RawBuffer", half, 1, 0) %15, target("dx.RawBuffer", half, 1, 0) %11

Basic Block

if.then.i:                                        ; preds = %entry
  %8 = call %dx.types.Handle @dx.op.createHandle(i32 57, i8 1, i32 3, i32 3, i1 false) #2
  %9 = call target("dx.RawBuffer", half, 1, 0) @llvm.dx.resource.casthandle.tdx.RawBuffer_f16_1_0t.s_dx.types.Handles(%dx.types.Handle %8)
  %10 = call %dx.types.Handle @dx.op.createHandle(i32 57, i8 1, i32 2, i32 2, i1 false) #2
  %11 = call target("dx.RawBuffer", half, 1, 0) @llvm.dx.resource.casthandle.tdx.RawBuffer_f16_1_0t.s_dx.types.Handles(%dx.types.Handle %10)
  %12 = call %dx.types.Handle @dx.op.createHandle(i32 57, i8 1, i32 1, i32 1, i1 false) #2
  %13 = call target("dx.RawBuffer", i32, 1, 0) @llvm.dx.resource.casthandle.tdx.RawBuffer_i32_1_0t.s_dx.types.Handles(%dx.types.Handle %12)
  %14 = call %dx.types.Handle @dx.op.createHandle(i32 57, i8 1, i32 0, i32 0, i1 false) #2
  %15 = call target("dx.RawBuffer", half, 1, 0) @llvm.dx.resource.casthandle.tdx.RawBuffer_f16_1_0t.s_dx.types.Handles(%dx.types.Handle %14)
  %16 = call %dx.types.Handle @llvm.dx.resource.casthandle.s_dx.types.Handles.tdx.CBuffer_tdx.Layout_s___cblayout_Constantss_80_0_16_32_48_64_68_72_76tt(target("dx.CBuffer", target("dx.Layout", %__cblayout_Constants, 80, 0, 16, 32, 48, 64, 68, 72, 76)) %0)
  %.load2653 = call %dx.types.CBufRet.i32 @dx.op.cbufferLoadLegacy.i32(i32 59, %dx.types.Handle %16, i32 4) #2
  %.extract27 = extractvalue %dx.types.CBufRet.i32 %.load2653, 0
  %mul.i = mul i32 %.extract27, %add1.i
  %17 = call %dx.types.ResRet.i32 @dx.op.rawBufferLoad.i32(i32 139, %dx.types.Handle %12, i32 %mul.i, i32 0, i8 1, i32 4)
  %18 = extractvalue %dx.types.ResRet.i32 %17, 0
  %19 = call %dx.types.Handle @llvm.dx.resource.casthandle.s_dx.types.Handles.tdx.CBuffer_tdx.Layout_s___cblayout_Constantss_80_0_16_32_48_64_68_72_76tt(target("dx.CBuffer", target("dx.Layout", %__cblayout_Constants, 80, 0, 16, 32, 48, 64, 68, 72, 76)) %0)
  %.load2952 = call %dx.types.CBufRet.i32 @dx.op.cbufferLoadLegacy.i32(i32 59, %dx.types.Handle %19, i32 4) #2
  %.extract30 = extractvalue %dx.types.CBufRet.i32 %.load2952, 1
  %cmp14.not.i = icmp ugt i32 %18, %.extract30
  %20 = call %dx.types.Handle @llvm.dx.resource.casthandle.s_dx.types.Handles.tdx.CBuffer_tdx.Layout_s___cblayout_Constantss_80_0_16_32_48_64_68_72_76tt(target("dx.CBuffer", target("dx.Layout", %__cblayout_Constants, 80, 0, 16, 32, 48, 64, 68, 72, 76)) %0)
  %.load54 = call %dx.types.CBufRet.i32 @dx.op.cbufferLoadLegacy.i32(i32 59, %dx.types.Handle %20, i32 3) #2
  %.extract = extractvalue %dx.types.CBufRet.i32 %.load54, 0
  %.extract22 = extractvalue %dx.types.CBufRet.i32 %.load54, 1
  %.extract23 = extractvalue %dx.types.CBufRet.i32 %.load54, 2
  %.extract24 = extractvalue %dx.types.CBufRet.i32 %.load54, 3
  %21 = mul i32 0, %.extract
  %dx.mad846 = call i32 @dx.op.tertiary.i32(i32 49, i32 0, i32 %.extract22, i32 %21) #3
  %dx.mad945 = call i32 @dx.op.tertiary.i32(i32 49, i32 %add1.i, i32 %.extract23, i32 %dx.mad846) #3
  %dx.mad1044 = call i32 @dx.op.tertiary.i32(i32 49, i32 %add.i, i32 %.extract24, i32 %dx.mad945) #3
  %. = select i1 %cmp14.not.i, target("dx.RawBuffer", half, 1, 0) %15, target("dx.RawBuffer", half, 1, 0) %11
  %22 = call %dx.types.Handle @llvm.dx.resource.casthandle.s_dx.types.Handles.tdx.CBuffer_tdx.Layout_s___cblayout_Constantss_80_0_16_32_48_64_68_72_76tt(target("dx.CBuffer", target("dx.Layout", %__cblayout_Constants, 80, 0, 16, 32, 48, 64, 68, 72, 76)) %0)
  %previousStrides.val.load59 = call %dx.types.CBufRet.i32 @dx.op.cbufferLoadLegacy.i32(i32 59, %dx.types.Handle %22, i32 0) #2
  %previousStrides.val.extract = extractvalue %dx.types.CBufRet.i32 %previousStrides.val.load59, 0
  %previousStrides.val.extract11 = extractvalue %dx.types.CBufRet.i32 %previousStrides.val.load59, 1
  %previousStrides.val.extract12 = extractvalue %dx.types.CBufRet.i32 %previousStrides.val.load59, 2
  %previousStrides.val.extract13 = extractvalue %dx.types.CBufRet.i32 %previousStrides.val.load59, 3
  %23 = call %dx.types.Handle @llvm.dx.resource.casthandle.s_dx.types.Handles.tdx.CBuffer_tdx.Layout_s___cblayout_Constantss_80_0_16_32_48_64_68_72_76tt(target("dx.CBuffer", target("dx.Layout", %__cblayout_Constants, 80, 0, 16, 32, 48, 64, 68, 72, 76)) %0)
  %hiddenStrides.val.load58 = call %dx.types.CBufRet.i32 @dx.op.cbufferLoadLegacy.i32(i32 59, %dx.types.Handle %23, i32 1) #2
  %hiddenStrides.val.extract = extractvalue %dx.types.CBufRet.i32 %hiddenStrides.val.load58, 0
  %hiddenStrides.val.extract15 = extractvalue %dx.types.CBufRet.i32 %hiddenStrides.val.load58, 1
  %hiddenStrides.val.extract16 = extractvalue %dx.types.CBufRet.i32 %hiddenStrides.val.load58, 2
  %hiddenStrides.val.extract17 = extractvalue %dx.types.CBufRet.i32 %hiddenStrides.val.load58, 3
  %.i0 = select i1 %cmp14.not.i, i32 %previousStrides.val.extract, i32 %hiddenStrides.val.extract
  %.i1 = select i1 %cmp14.not.i, i32 %previousStrides.val.extract11, i32 %hiddenStrides.val.extract15
  %.i2 = select i1 %cmp14.not.i, i32 %previousStrides.val.extract12, i32 %hiddenStrides.val.extract16
  %.i3 = select i1 %cmp14.not.i, i32 %previousStrides.val.extract13, i32 %hiddenStrides.val.extract17
  %24 = mul i32 0, %.i0
  %dx.mad49 = call i32 @dx.op.tertiary.i32(i32 49, i32 0, i32 %.i1, i32 %24) #3
  %dx.mad648 = call i32 @dx.op.tertiary.i32(i32 49, i32 %add1.i, i32 %.i2, i32 %dx.mad49) #3
  %dx.mad747 = call i32 @dx.op.tertiary.i32(i32 49, i32 %add.i, i32 %.i3, i32 %dx.mad648) #3
  %25 = call %dx.types.Handle @llvm.dx.resource.casthandle.s_dx.types.Handles.tdx.RawBuffer_f16_1_0t(target("dx.RawBuffer", half, 1, 0) %.)
  %26 = call %dx.types.ResRet.f16 @dx.op.rawBufferLoad.f16(i32 139, %dx.types.Handle %25, i32 %dx.mad747, i32 0, i8 1, i32 2)
  %27 = extractvalue %dx.types.ResRet.f16 %26, 0
  call void @dx.op.rawBufferStore.f16(i32 140, %dx.types.Handle %8, i32 %dx.mad1044, i32 0, half %27, half undef, half undef, half undef, i8 1, i32 2)
  br label %_Z6CSMainDv3_j.exit

Crash Stack

clang-dxc: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 22.0.0git (git@github.com:llvm/llvm-project.git 6d25d1d5508b7ba9aca77188a175d0a6348c581f)
Target: dxilv1.2-unknown-shadermodel6.2-compute
Thread model: posix
InstalledDir: /llvm_rel_with_deb_info/bin
Build config: +assertions, +asan
Cause instances: 60
Cause:
0.	Program arguments: /llvm_rel_with_deb_info/bin/clang-dxc ../DirectML/Product/Shaders/Generated/Quantize_256_4_float16_native_accum32_int4_packed32.hlsl -E CSMain -T cs_6_2  -enable-16bit-types  -O3 -D DXC_COMPILER=1 -D __SHADER_TARGET_MAJOR=6 -D __SHADER_TARGET_MINOR=2 -Vd -I ../DirectML/Product/Shaders/ -Fo ClangDML/tools/validation/Quantize_256_4_float16_native_accum32_int4_packed32.dat
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'DXIL Embedder' on module '../DirectML/Product/Shaders/Generated/Quantize_256_4_float16_native_accum32_int4_packed32.hlsl'.
----------------------------------------
============================================================
Stack Traces:
#0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (llvm_rel_with_deb_info/bin/clang-21+)
 #1 llvm::sys::CleanupOnSignal(unsigned long) (llvm_rel_with_deb_info/bin/clang-21+)
 #2 CrashRecoverySignalHandler(int) (llvm_rel_with_deb_info/bin/clang-21+)
 #3 (/usr/lib/system/libsystem_platform.dylib+)
 #4 (/usr/lib/system/libsystem_pthread.dylib+)
 #5 (/usr/lib/system/libsystem_c.dylib+)
 #6 (/usr/lib/system/libsystem_c.dylib+)
 #7 (anonymous namespace)::OpLowerer::lowerIntrinsics() (/llvm_rel_with_deb_info/bin/clang-21+)
 #8 (anonymous namespace)::DXILOpLoweringLegacy::runOnModule(llvm::Module&) (/llvm_rel_with_deb_info/bin/clang-21+)
 #9 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/llvm_rel_with_deb_info/bin/clang-21+)
#10 clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/llvm_rel_with_deb_info/bin/clang-21+)
#11 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (llvm_rel_with_deb_info/bin/clang-21+)
#12 clang::ParseAST(clang::Sema&, bool, bool) (lvm_rel_with_deb_info/bin/clang-21+)
#13 clang::CodeGenAction::ExecuteAction() (llvm_rel_with_deb_info/bin/clang-21+)
#14 clang::FrontendAction::Execute() (/llvm_rel_with_deb_info/bin/clang-21+)
#15 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/llvm_rel_with_deb_info/bin/clang-21+)
#16 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/llvm_rel_with_deb_info/bin/clang-21+)
#17 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/llvm_rel_with_deb_info/bin/clang-21+)
#18 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) (/llvm_rel_with_deb_info/bin/clang-21+)
#19 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__1::optional<llvm::StringRef>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, bool*) const::$_0>(long) (/llvm_rel_with_deb_info/bin/clang-21+)
#20 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/llvm_rel_with_deb_info/bin/clang-21+)
#21 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__1::optional<llvm::StringRef>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, bool*) const (/llvm_rel_with_deb_info/bin/clang-21+)
#22 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/llvm_rel_with_deb_info/bin/clang-21+)
#23 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*>>&, bool) const (/llvm_rel_with_deb_info/bin/clang-21+)
#24 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*>>&) (/llvm_rel_with_deb_info/bin/clang-21+)
#25 clang_main(int, char**, llvm::ToolContext const&) (/llvm_rel_with_deb_info/bin/clang-21+)
#26 main (/llvm_rel_with_deb_info/bin/clang-21+)
#27 
clang-dxc: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 22.0.0git (git@github.com:llvm/llvm-project.git 6d25d1d5508b7ba9aca77188a175d0a6348c581f)
Target: dxilv1.2-unknown-shadermodel6.2-compute
Thread model: posix
InstalledDir: /llvm_rel_with_deb_info/bin
Build config: +assertions, +asan
Cause instances: 2
Cause:
0.	Program arguments: /llvm_rel_with_deb_info/bin/clang-dxc ../DirectML/Product/Shaders/Generated/RNNOverwrite_16.hlsl -E CSMain -T cs_6_2  -enable-16bit-types  -O3 -D DXC_COMPILER=1 -D __SHADER_TARGET_MAJOR=6 -D __SHADER_TARGET_MINOR=2 -Vd -I ../DirectML/Product/Shaders/ -Fo /ClangDML/tools/validation/RNNOverwrite_16.dat
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'DXIL Op Lowering' on module '../DirectML/Product/Shaders/Generated/RNNOverwrite_16.hlsl'.
----------------------------------------
============================================================

Metadata

Metadata

Assignees

Type

Projects

Status

Closed

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions