Skip to content

Conversation

@pow2clk
Copy link
Collaborator

@pow2clk pow2clk commented Apr 1, 2025

Enables the declaration of long vector types for raw buffers, the lowering of those
and traditional vectors in loads and stores maintaining the native types with
new dxil ops along with validation and testing support of the same.

Allow declaring long vector rawbuffer resources.
Previously disallowed along with other global types, this provides a
mechanism for indicating which buffers are raw and allowing them to
contain long vectors, continuing to produce an error for other resource
types verified by existing tests

Introduce native vector DXIL load/store intrinsics.
Add new raw buffer vector load/store intrinsics using the new vector overload types.
Include them in validation associated with similar load/stores

Lower native vector raw buffers load/stores into new ops.
When the loaded/stored type is a vector of more than 1 element, the
shader model is 6.9 or higher, and the operation is on a raw buffer,
enable the generation of a native vector raw buffer load or store.

Incidental removal of unused parameter in load translation and some refactoring
of the lowering to flow better with the new resret types.

add validation and compute shader tests

Vector to scalar raw buffer load lowering pass
Native vector loads and stores are generated for 6.9 targets and above.
This includes the 6.x target used when compiling to libraries. This adds
a pass run when linking that will lower the vector operations to scalar
operations for shader models that don't have native vector support. This
allows libraries compiled for supportive shader models to be linked to
targets without support.

Greg Roth added 4 commits April 1, 2025 15:13
Previously disallowed along with other global types, this provides a
mechanism for indicating which buffers are raw and allowing them to
contain long vectors, continuing to produce an error for other resource
types verified by existing tests
Add new raw buffer vector load/store intrinsics using the new vector overload types.
Include them in validation associated with similar load/stores
When the loaded/stored type is a vector of more than 1 element, the
shader model is 6.9 or higher, and the operation is on a raw buffer,
enable the generation of a native vector raw buffer load or store.

Incidental removal of unused parameter in load translation

add validation and compute shader tests
Native vector loads and stores are generated for 6.9 targets and above.
This includes the 6.x target used when compiling to libraries. This adds
a pass run when linking that will lower the vector operations to scalar
operations for shader models that don't have native vector support. This
allows libraries compiled for supportive shader models to be linked to
targets without support.
@pow2clk pow2clk requested a review from a team as a code owner April 1, 2025 21:19
@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@pow2clk
Copy link
Collaborator Author

pow2clk commented Apr 1, 2025

REVIEW GUIDANCE! The first four individual commits were designed to be reviewed individually and might make the review more pleasant. That said, I don't expect the full review would be too difficult either.

@pow2clk
Copy link
Collaborator Author

pow2clk commented Apr 1, 2025

I clang-formatted this to within an inch of its life. No idea why it didn't fail until now.

bogner
bogner previously approved these changes Apr 1, 2025
Copy link
Collaborator

@bogner bogner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor things but this LGTM

return "DXIL scalarize vector load/stores";
}

bool runOnModule(Module &M) override {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worthwhile to split this into two classes - The pass DxilScalarizeLoadStores, and a separate (loca,/ anonymous namespace) VectorLoadStoreScalarizer. This separates concerns nicely and avoids the awkward situation of mutating the class member m_DM on every call to runOnModule (It also means you can avoid the polish notation for the member variable...)

Then the pass just needs:

bool runOnModule(Module &M) override {
  DxilModule &DM = M.GetOrCreateDxilModule();
  if (DM.GetShaderModel()->IsSM69Plus())
      return false;
  return VectorLoadStoreScalarizer(DM).scalarize();
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the member dxilmodule was actually unnecessary and I'd meant to remove it. Most of the simplification of runOnModule you've added here would just push the iteration down to the scalarization function. I don't mind where it goes, but the benefit, small though it is, of iterating in runOnModule is I only have to iterate through the functions once.

I'm perfectly willing to consider it, but I think I remain unclear on the benefit of having separate classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split the load and store scalarizations into two non-method static functions.

Comment on lines 55 to 56
for (auto F = M.functions().begin(), E = M.functions().end(); F != E;) {
Function *Func = &*(F++);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(side comment - no change needed)
It's too bad we don't have upstream llvm's make_early_inc_range available here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and their version of filecheck.

Args.emplace_back(VecLd.get_alignment()); // Alignment @5.

// Set offset to increment depending on whether the real offset is defined.
unsigned OffsetIdx = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some debate about this, so feel free to use your own preference, but I would leave OffsetIdx uninitialized here since that will give a nice compiler warning if you forget to set it in one of the branches of the conditional below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

@tex3d tex3d self-assigned this Apr 1, 2025
Greg Roth added 2 commits April 1, 2025 17:56
Turned out to be unneeded

This reverts commit cdaca3f.
Use IS_BASIC_OBJECT to allow global raw buffers with long vectors. The rawbuffer bits weren't needed.

Remove unused DxilModule from vector load store scalarize pass

Leave sure-to-be-assigned variable uninitialized

Add test that long vectors in pre-6.9 StructuredBuffers produces an
error

Exclude the final 1.9 dxil op code for now.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a request, but a suggestion for the future:

This PR would be significantly smaller and easier to review if this pass had been added in its own PR with its own tests and no other changes. A subsequent PR could add it to the pipeline and add the additional required changes to integrate it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I have a large block of code to commit that is interrelated, but separable and I'm unsure how best to economize reviewer time in the perennial conflict between doing a lot of reviews versus one big one, I try to have the best of both by dividing the separable portions into sensible commits and putting them in one PR. That's what I did here as I explained in this comment: #7292 (comment)

I've seen evidence that these comments are often missed though. I wonder if putting an inline comment at the top of the change would be more effective.

The individual PR you describe is essentially this commit: 27f0e73

A lot of this is tests and this change unblocked a lot of testing and it was always going to open the floodgates of tests for the combination of elements it enabled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are solutions for doing stacked PRs (https://github.com/modular/stack-pr, https://graphite.dev/blog/stacked-prs, and others). The GitHub UI for "commits" is not a good tool for reviewing completely separable changes, and relying on someone seeing a comment in the UI is lossy at best.

New passes are always separable because by LLVM's design they can be run with IR inputs and outputs and be completely independent of other changes. They make a great boundary for separating pull requests.

Copy link
Collaborator Author

@pow2clk pow2clk Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried it? I'd be interested to know what happens when the PR that is depended on changes. Faced with this situation again, I was thinking I'd just create a PR against the topic branch of the first PR.

That's not really the problem I meant to solve by making this one change with logically separated PRs though. I wanted this to go in together both to economize reviewer time having to look at one change and also the interdependencies. Several things could have gone in before others, but incompletely tested.

I'm not saying I made the right decision here, but I see other problems with opposing using logically separated commits so as to not have to read the comments of a PR before reviewing.

// Collect the information required to break this into scalar ops from args.
DxilInst_RawBufferVectorLoad VecLd(CI);
OP::OpCode OpCode = OP::OpCode::RawBufferLoad;
llvm::Constant *opArg = Builder.getInt32((unsigned)OpCode);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
llvm::Constant *opArg = Builder.getInt32((unsigned)OpCode);
llvm::Constant *OpArg = Builder.getInt32((unsigned)OpCode);

New code should follow LLVM coding standards.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this one.


// CHECK: %dx.types.ResRet.[[VTY:v[0-9]*[a-z][0-9][0-9]]] = type { <[[NUM:[0-9]*]] x [[TYPE:[a-z_0-9]*]]>, i32 }

ByteAddressBuffer RoByBuf : register(t1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take or leave: the leading whitespace bothers me greatly... I know we don't really have a policy on formatting HLSL but this reminds me too much of the objective-c tendency to align colons that I've always found grating.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here was to keep the corresponding portions aligned so that, at a glance, it's clear that these are both ByteAddressBuffers and one is RW though I wasn't consistent with that elsewhere. I don't mind removing it.

RWByteAddressBuffer RwByBuf : register(u1);

StructuredBuffer< vector<TYPE, NUM> > RoStBuf : register(t2);
RWStructuredBuffer< vector<TYPE, NUM> > RwStBuf : register(u2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also take or leave: I also find some of the double-spaces odd... I know we have have C++03-ish templates so you need a space, but extra spaces seem to just make lines longer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's the unintended double space after the > that bothers you or the unnecessary, but I think helpful though inconsistently applied leading space before vector.

@@ -0,0 +1,708 @@
// RUN: %dxc -HV 2018 -T cs_6_9 -DTYPE=float -DNUM=2 %s | FileCheck %s --check-prefixes=CHECK,NODBL,NOINT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also be testing vector<T,1> here? Or potentially elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good suggestion as they should be treated like scalars. Given how different the output will be, I'd prefer to do it elsewhere. I had intended to do the same thing to the operators-vec1s test as this though originally I'd hoped they could remain part of the same tests, but the reorganization just got too involved and I've been critiqued for how complicated I make my CHECK lines in such cases.

bool Changed = false;

hlsl::OP *HlslOP = DM.GetOP();
for (auto F = M.functions().begin(), E = M.functions().end(); F != E;) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can iterate the overloads for specific ops you want instead with HlslOP::GetOpFuncList. For instance:

for (auto it : hlslOP->GetOpFuncList(DXIL::OpCode::AnnotateHandle)) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I did originally. I ran into a lot of problems with it because GetOpFuncList was returning a list with duplicate entries for the same function leading to double deletion. I have a note to investigate that further, but in the meantime, rather than create a list of functions to delete that I iterate through again at the end to just delete them all, I opted to use the same iteration method that the cousin pass TranslateRawBuffer uses. I don't feel strongly about it though. I could probably do as the code you cite does and rely on later passes to eliminate the unused functions for me. I don't have an issue with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, for one thing, erasing a function during iteration of this list will erase the entry from the list. Since this is a SmallMapVector, the iterator points into the vector, which will be invalidated by any modification. That doesn't seem like it should result specifically in duplicate entries and double-delete, but it is in UB territory.

If the original list contains duplicate entries, that is a cause for concern. We shouldn't allow that.

if (OpClass == DXIL::OpCodeClass::RawBufferVectorLoad) {
for (auto U = Func->user_begin(), UE = Func->user_end(); U != UE;) {
CallInst *CI = cast<CallInst>(*(U++));
scalarizeVectorLoad(HlslOP, M.getDataLayout(), CI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always a good form when iterating DXIL op class overload function users to check the opcode (even though you know in this case it should be the only one for this OpCodeClass):

Suggested change
scalarizeVectorLoad(HlslOP, M.getDataLayout(), CI);
if (HlslOP->GetDxilOpFuncCallInst(CI) !=
hlsl::OP::OpCode::RawBufferVectorLoad)
continue;
scalarizeVectorLoad(HlslOP, M.getDataLayout(), CI);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this wouldn't be required if I use the GetOpFuncList approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that you do technically have to check, even with GetOpFuncList. That's because this returns the list of overload functions for the OpCodeClass corresponding to the OpCode. Basically, that's all it could do, since different DXIL operations use the same function declaration when they share the same OpCodeClass.

I think the best way is to get the OpCode using: hlsl::OP::GetDxilOpFuncCallInst(CI) == hlsl::OP::OpCode::RawBufferVectorLoad (this avoids a redundant IsDxilOpFuncCallInst check on NDEBUG at least).

I looked at other code using this function and basically nobody is checking the OpCode of the CallInst users. I'm considering filing an issue about this, but I don't think there are any real bugs caused by this at the moment. Basically, you're safe if there's a 1:1 mapping between OpCode and OpCodeClass, which happens to be true in all use cases. If not, since this always lists all the overloads for the OpCodeClass that corresponds to the OpCode passed in, you could be iterating CallInst users that aren't calling with the same OpCode you're looking for. I found one use of GetOpFuncList that looks like it could potentially have this problem in OP::IsDxilOpUsed, but this is only called with OpCodes that are safe, so I don't think it doesn't cause any real bugs in practice (yet).

I think it would probably be better to make GetOpFuncList take the OpCodeClass instead of the OpCode. Then you're forced to HlslOp->GetOpFuncList(hlsl::OP::GetOpCodeClass(MyOpCode)) and it's more obvious that you need to check the OpCode for each CallInst user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding using hlsl::OP::GetDxilOpFuncCallInst, I guess hlsl::OP::getOpCode is just fine actually.

Copy link
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. One correctness issue with the alignment for the load op, and a few minor comments.

Copy link
Collaborator Author

@pow2clk pow2clk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the feedback! I'll respond to most of it in the next commit. The vec1 test will have its own commit as it'll take just a bit longer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I have a large block of code to commit that is interrelated, but separable and I'm unsure how best to economize reviewer time in the perennial conflict between doing a lot of reviews versus one big one, I try to have the best of both by dividing the separable portions into sensible commits and putting them in one PR. That's what I did here as I explained in this comment: #7292 (comment)

I've seen evidence that these comments are often missed though. I wonder if putting an inline comment at the top of the change would be more effective.

The individual PR you describe is essentially this commit: 27f0e73

A lot of this is tests and this change unblocked a lot of testing and it was always going to open the floodgates of tests for the combination of elements it enabled.

return "DXIL scalarize vector load/stores";
}

bool runOnModule(Module &M) override {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split the load and store scalarizations into two non-method static functions.

bool Changed = false;

hlsl::OP *HlslOP = DM.GetOP();
for (auto F = M.functions().begin(), E = M.functions().end(); F != E;) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I did originally. I ran into a lot of problems with it because GetOpFuncList was returning a list with duplicate entries for the same function leading to double deletion. I have a note to investigate that further, but in the meantime, rather than create a list of functions to delete that I iterate through again at the end to just delete them all, I opted to use the same iteration method that the cousin pass TranslateRawBuffer uses. I don't feel strongly about it though. I could probably do as the code you cite does and rely on later passes to eliminate the unused functions for me. I don't have an issue with that.

if (OpClass == DXIL::OpCodeClass::RawBufferVectorLoad) {
for (auto U = Func->user_begin(), UE = Func->user_end(); U != UE;) {
CallInst *CI = cast<CallInst>(*(U++));
scalarizeVectorLoad(HlslOP, M.getDataLayout(), CI);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this wouldn't be required if I use the GetOpFuncList approach?

RWByteAddressBuffer RwByBuf : register(u1);

StructuredBuffer< vector<TYPE, NUM> > RoStBuf : register(t2);
RWStructuredBuffer< vector<TYPE, NUM> > RwStBuf : register(u2);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's the unintended double space after the > that bothers you or the unnecessary, but I think helpful though inconsistently applied leading space before vector.

@@ -0,0 +1,708 @@
// RUN: %dxc -HV 2018 -T cs_6_9 -DTYPE=float -DNUM=2 %s | FileCheck %s --check-prefixes=CHECK,NODBL,NOINT
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good suggestion as they should be treated like scalars. Given how different the output will be, I'd prefer to do it elsewhere. I had intended to do the same thing to the operators-vec1s test as this though originally I'd hoped they could remain part of the same tests, but the reorganization just got too involved and I've been critiqued for how complicated I make my CHECK lines in such cases.

Comment on lines 55 to 56
for (auto F = M.functions().begin(), E = M.functions().end(); F != E;) {
Function *Func = &*(F++);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and their version of filecheck.

Separate out the load store lowering functions from the pass class
 in vector scalarization pass

Calculate load size on scalar element rather than resret element type to
get correct alignment on scalars instead of on the whole vector.
Adjust various tests accordingly that depended on or used code generated
with the incorrect alignment, which always just hit the max of 8.

Change iteration through matching intrinsics in link scalarization pass

Capitalize varibles according to coding standards

Include and remove whitespace in tests.
@damyanp damyanp removed the request for review from a team April 2, 2025 21:49
tex3d
tex3d previously approved these changes Apr 3, 2025
Greg Roth added 2 commits April 3, 2025 03:26
Add cs test for vec1s. Reformat other operator tests to be more
consistent.
validate native vector loads and stores for properly defined parameters
of the correct type. Add tests for both vector load/stores and the
original scalar load/stores since they share a lot of validation code.
}
LLVM_FALLTHROUGH;
case DXIL::OpCode::RawBufferVectorStore: {
Value *handle =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value *handle =
Value *Handle =

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is extremely consistent. All but a few exceptions are not capitalized. I think the golden rule applies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really?

That's just up to line 101.

if (resClass != DXIL::ResourceClass::UAV)
ValCtx.EmitInstrError(CI, ValidationRule::InstrResourceClassForUAVStore);

unsigned alignIdx = DXIL::OperandIndex::kRawBufferStoreAlignmentOpIdx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned alignIdx = DXIL::OperandIndex::kRawBufferStoreAlignmentOpIdx;
unsigned AlignIdx = DXIL::OperandIndex::kRawBufferStoreAlignmentOpIdx;

unsigned alignIdx = DXIL::OperandIndex::kRawBufferStoreAlignmentOpIdx;
if (DXIL::OpCode::RawBufferVectorStore == opcode) {
alignIdx = DXIL::OperandIndex::kRawBufferVectorStoreAlignmentOpIdx;
unsigned valueIx = DXIL::OperandIndex::kRawBufferVectorStoreValOpIdx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned valueIx = DXIL::OperandIndex::kRawBufferVectorStoreValOpIdx;
unsigned ValueIx = DXIL::OperandIndex::kRawBufferVectorStoreValOpIdx;

DxilInst_RawBufferLoad bufLd(CI);
LLVM_FALLTHROUGH;
case DXIL::OpCode::RawBufferVectorLoad: {
Value *handle =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value *handle =
Value *Handle =

} else {
alignSize = bufLd.get_alignment_val();
}
unsigned alignIdx = DXIL::OperandIndex::kRawBufferLoadAlignmentOpIdx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned alignIdx = DXIL::OperandIndex::kRawBufferLoadAlignmentOpIdx;
unsigned AlignIdx = DXIL::OperandIndex::kRawBufferLoadAlignmentOpIdx;

if (!isa<ConstantInt>(CI->getOperand(alignIdx)))
ValCtx.EmitInstrError(CI, ValidationRule::InstrConstAlignForRawBuf);

Value *offset =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value *offset =
Value *Offset =

if (!isa<ConstantInt>(CI->getOperand(alignIdx)))
ValCtx.EmitInstrError(CI, ValidationRule::InstrConstAlignForRawBuf);

Value *offset =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value *offset =
Value *Offset =

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are solutions for doing stacked PRs (https://github.com/modular/stack-pr, https://graphite.dev/blog/stacked-prs, and others). The GitHub UI for "commits" is not a good tool for reviewing completely separable changes, and relying on someone seeing a comment in the UI is lossy at best.

New passes are always separable because by LLVM's design they can be run with IR inputs and outputs and be completely independent of other changes. They make a great boundary for separating pull requests.

@pow2clk
Copy link
Collaborator Author

pow2clk commented Apr 3, 2025

I've taken the shortest path and because I really don't want to go over this again for this same file at least, I've done this #7307

tex3d
tex3d previously approved these changes Apr 3, 2025
Copy link
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Just a minor comment about RUN lines in hlsl files under DXILValidation where these are not run as tests. I know other files have similar issues, but it's been misleading. Sometimes, other contributors will place tests here, thinking that it's the right location for validation shell tests, and they are simply ignored.

@pow2clk
Copy link
Collaborator Author

pow2clk commented Apr 4, 2025

Well, I wanted to see if resolving conflicts online would still require a new review and it does

Greg Roth added 2 commits April 4, 2025 10:53
Some of these didn't come out quite right either from the manual or
automatic merge.
@pow2clk pow2clk merged commit 0ffd60a into microsoft:main Apr 4, 2025
12 checks passed
@pow2clk pow2clk deleted the nvec_ldst_pr branch April 4, 2025 20:10
@github-project-automation github-project-automation bot moved this from New to Done in HLSL Roadmap Apr 4, 2025
@toji
Copy link

toji commented Apr 4, 2025

It seems like this may have broken our ability to roll the shader compiler into Dawn for Chromium (WebGPU).

Attempting to update to ToT with this change included starts producing the following compiler errors similar to the following for us:

FAILED: 71219ff8-ffe6-4725-9dbc-3e16cf060fd2 "./libdxcompiler.so" SOLINK ./libdxcompiler.so
err: exit=1
"python3" "../../build/toolchain/gcc_solink_wrapper.py" --readelf="../../third_party/llvm-build/Release+Asserts/bin/llvm-readelf" --nm="../../third_party/llvm-build/Release+Asserts/bin/llvm-nm"  --sofile="./libdxcompiler.so" --tocfile="./libdxcompiler.so.TOC" --output="./libdxcompiler.so" -- ../../third_party/llvm-build/Release+Asserts/bin/clang++ -shared -Wl,-soname="libdxcompiler.so" -fuse-ld=lld -Wl,--fatal-warnings -Wl,--build-id=fast -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,--color-diagnostics -Wl,--undefined-version -Wl,--no-call-graph-profile-sort -m64 -no-canonical-prefixes -Wl,--gdb-index -Wl,-z,defs -Wl,--as-needed -nostdlib++ --sysroot=../../build/linux/debian_bullseye_amd64-sysroot -Werror -o "./libdxcompiler.so" @"./libdxcompiler.so.rsp"  
build step: solink "./libdxcompiler.so"
siso_rule: clang/solink
stderr:
ld.lld: error: lib_hlsl/DxilLinker.o: DWARF unit at offset 0x00000000 has unsupported version 0, supported are 2-5
ld.lld: error: undefined symbol: llvm::createDxilScalarizeVectorLoadStoresPass()
>>> referenced by DxilLinker.cpp
>>>               lib_hlsl/DxilLinker.o:((anonymous namespace)::DxilLinkJob::RunPreparePass(llvm::Module&)) in archive obj/third_party/gn/dxc/lib_hlsl.a

ld.lld: error: lib_hlsl/DxcOptimizer.o: DWARF unit at offset 0x00000000 has unsupported version 0, supported are 2-5
ld.lld: error: undefined symbol: llvm::initializeDxilScalarizeVectorLoadStoresPass(llvm::PassRegistry&)
>>> referenced by DxcOptimizer.cpp
>>>               lib_hlsl/DxcOptimizer.o:(hlsl::SetupRegistryPassForHLSL()) in archive obj/third_party/gn/dxc/lib_hlsl.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

Given that it can't resolve createDxilScalarizeVectorLoadStoresPass it definitely appears related to this PR.

@dneto0
Copy link
Collaborator

dneto0 commented Apr 7, 2025

It seems like this may have broken our ability to roll the shader compiler into Dawn for Chromium (WebGPU).

The fix was on our end: we had to add the new .cpp file to our BUILD.gn file.

https://dawn-review.googlesource.com/c/dawn/+/235214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants