-
Notifications
You must be signed in to change notification settings - Fork 803
[SM6.9] Native vector load/store lowering #7292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously disallowed along with other global types, this provides a mechanism for indicating which buffers are raw and allowing them to contain long vectors, continuing to produce an error for other resource types verified by existing tests
Add new raw buffer vector load/store intrinsics using the new vector overload types. Include them in validation associated with similar load/stores
When the loaded/stored type is a vector of more than 1 element, the shader model is 6.9 or higher, and the operation is on a raw buffer, enable the generation of a native vector raw buffer load or store. Incidental removal of unused parameter in load translation add validation and compute shader tests
Native vector loads and stores are generated for 6.9 targets and above. This includes the 6.x target used when compiling to libraries. This adds a pass run when linking that will lower the vector operations to scalar operations for shader models that don't have native vector support. This allows libraries compiled for supportive shader models to be linked to targets without support.
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
REVIEW GUIDANCE! The first four individual commits were designed to be reviewed individually and might make the review more pleasant. That said, I don't expect the full review would be too difficult either. |
|
I clang-formatted this to within an inch of its life. No idea why it didn't fail until now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor things but this LGTM
| return "DXIL scalarize vector load/stores"; | ||
| } | ||
|
|
||
| bool runOnModule(Module &M) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worthwhile to split this into two classes - The pass DxilScalarizeLoadStores, and a separate (loca,/ anonymous namespace) VectorLoadStoreScalarizer. This separates concerns nicely and avoids the awkward situation of mutating the class member m_DM on every call to runOnModule (It also means you can avoid the polish notation for the member variable...)
Then the pass just needs:
bool runOnModule(Module &M) override {
DxilModule &DM = M.GetOrCreateDxilModule();
if (DM.GetShaderModel()->IsSM69Plus())
return false;
return VectorLoadStoreScalarizer(DM).scalarize();
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the member dxilmodule was actually unnecessary and I'd meant to remove it. Most of the simplification of runOnModule you've added here would just push the iteration down to the scalarization function. I don't mind where it goes, but the benefit, small though it is, of iterating in runOnModule is I only have to iterate through the functions once.
I'm perfectly willing to consider it, but I think I remain unclear on the benefit of having separate classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split the load and store scalarizations into two non-method static functions.
| for (auto F = M.functions().begin(), E = M.functions().end(); F != E;) { | ||
| Function *Func = &*(F++); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(side comment - no change needed)
It's too bad we don't have upstream llvm's make_early_inc_range available here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, and their version of filecheck.
| Args.emplace_back(VecLd.get_alignment()); // Alignment @5. | ||
|
|
||
| // Set offset to increment depending on whether the real offset is defined. | ||
| unsigned OffsetIdx = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some debate about this, so feel free to use your own preference, but I would leave OffsetIdx uninitialized here since that will give a nice compiler warning if you forget to set it in one of the branches of the conditional below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
Turned out to be unneeded This reverts commit cdaca3f.
Use IS_BASIC_OBJECT to allow global raw buffers with long vectors. The rawbuffer bits weren't needed. Remove unused DxilModule from vector load store scalarize pass Leave sure-to-be-assigned variable uninitialized Add test that long vectors in pre-6.9 StructuredBuffers produces an error Exclude the final 1.9 dxil op code for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a request, but a suggestion for the future:
This PR would be significantly smaller and easier to review if this pass had been added in its own PR with its own tests and no other changes. A subsequent PR could add it to the pipeline and add the additional required changes to integrate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I have a large block of code to commit that is interrelated, but separable and I'm unsure how best to economize reviewer time in the perennial conflict between doing a lot of reviews versus one big one, I try to have the best of both by dividing the separable portions into sensible commits and putting them in one PR. That's what I did here as I explained in this comment: #7292 (comment)
I've seen evidence that these comments are often missed though. I wonder if putting an inline comment at the top of the change would be more effective.
The individual PR you describe is essentially this commit: 27f0e73
A lot of this is tests and this change unblocked a lot of testing and it was always going to open the floodgates of tests for the combination of elements it enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are solutions for doing stacked PRs (https://github.com/modular/stack-pr, https://graphite.dev/blog/stacked-prs, and others). The GitHub UI for "commits" is not a good tool for reviewing completely separable changes, and relying on someone seeing a comment in the UI is lossy at best.
New passes are always separable because by LLVM's design they can be run with IR inputs and outputs and be completely independent of other changes. They make a great boundary for separating pull requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried it? I'd be interested to know what happens when the PR that is depended on changes. Faced with this situation again, I was thinking I'd just create a PR against the topic branch of the first PR.
That's not really the problem I meant to solve by making this one change with logically separated PRs though. I wanted this to go in together both to economize reviewer time having to look at one change and also the interdependencies. Several things could have gone in before others, but incompletely tested.
I'm not saying I made the right decision here, but I see other problems with opposing using logically separated commits so as to not have to read the comments of a PR before reviewing.
| // Collect the information required to break this into scalar ops from args. | ||
| DxilInst_RawBufferVectorLoad VecLd(CI); | ||
| OP::OpCode OpCode = OP::OpCode::RawBufferLoad; | ||
| llvm::Constant *opArg = Builder.getInt32((unsigned)OpCode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| llvm::Constant *opArg = Builder.getInt32((unsigned)OpCode); | |
| llvm::Constant *OpArg = Builder.getInt32((unsigned)OpCode); |
New code should follow LLVM coding standards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this one.
|
|
||
| // CHECK: %dx.types.ResRet.[[VTY:v[0-9]*[a-z][0-9][0-9]]] = type { <[[NUM:[0-9]*]] x [[TYPE:[a-z_0-9]*]]>, i32 } | ||
|
|
||
| ByteAddressBuffer RoByBuf : register(t1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take or leave: the leading whitespace bothers me greatly... I know we don't really have a policy on formatting HLSL but this reminds me too much of the objective-c tendency to align colons that I've always found grating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention here was to keep the corresponding portions aligned so that, at a glance, it's clear that these are both ByteAddressBuffers and one is RW though I wasn't consistent with that elsewhere. I don't mind removing it.
| RWByteAddressBuffer RwByBuf : register(u1); | ||
|
|
||
| StructuredBuffer< vector<TYPE, NUM> > RoStBuf : register(t2); | ||
| RWStructuredBuffer< vector<TYPE, NUM> > RwStBuf : register(u2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also take or leave: I also find some of the double-spaces odd... I know we have have C++03-ish templates so you need a space, but extra spaces seem to just make lines longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's the unintended double space after the > that bothers you or the unnecessary, but I think helpful though inconsistently applied leading space before vector.
| @@ -0,0 +1,708 @@ | |||
| // RUN: %dxc -HV 2018 -T cs_6_9 -DTYPE=float -DNUM=2 %s | FileCheck %s --check-prefixes=CHECK,NODBL,NOINT | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also be testing vector<T,1> here? Or potentially elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good suggestion as they should be treated like scalars. Given how different the output will be, I'd prefer to do it elsewhere. I had intended to do the same thing to the operators-vec1s test as this though originally I'd hoped they could remain part of the same tests, but the reorganization just got too involved and I've been critiqued for how complicated I make my CHECK lines in such cases.
tools/clang/test/CodeGenDXIL/passes/longvec-load-stores-scalarizevecldst.ll
Outdated
Show resolved
Hide resolved
| bool Changed = false; | ||
|
|
||
| hlsl::OP *HlslOP = DM.GetOP(); | ||
| for (auto F = M.functions().begin(), E = M.functions().end(); F != E;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can iterate the overloads for specific ops you want instead with HlslOP::GetOpFuncList. For instance:
| for (auto it : hlslOP->GetOpFuncList(DXIL::OpCode::AnnotateHandle)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I did originally. I ran into a lot of problems with it because GetOpFuncList was returning a list with duplicate entries for the same function leading to double deletion. I have a note to investigate that further, but in the meantime, rather than create a list of functions to delete that I iterate through again at the end to just delete them all, I opted to use the same iteration method that the cousin pass TranslateRawBuffer uses. I don't feel strongly about it though. I could probably do as the code you cite does and rely on later passes to eliminate the unused functions for me. I don't have an issue with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, for one thing, erasing a function during iteration of this list will erase the entry from the list. Since this is a SmallMapVector, the iterator points into the vector, which will be invalidated by any modification. That doesn't seem like it should result specifically in duplicate entries and double-delete, but it is in UB territory.
If the original list contains duplicate entries, that is a cause for concern. We shouldn't allow that.
| if (OpClass == DXIL::OpCodeClass::RawBufferVectorLoad) { | ||
| for (auto U = Func->user_begin(), UE = Func->user_end(); U != UE;) { | ||
| CallInst *CI = cast<CallInst>(*(U++)); | ||
| scalarizeVectorLoad(HlslOP, M.getDataLayout(), CI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always a good form when iterating DXIL op class overload function users to check the opcode (even though you know in this case it should be the only one for this OpCodeClass):
| scalarizeVectorLoad(HlslOP, M.getDataLayout(), CI); | |
| if (HlslOP->GetDxilOpFuncCallInst(CI) != | |
| hlsl::OP::OpCode::RawBufferVectorLoad) | |
| continue; | |
| scalarizeVectorLoad(HlslOP, M.getDataLayout(), CI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this wouldn't be required if I use the GetOpFuncList approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that you do technically have to check, even with GetOpFuncList. That's because this returns the list of overload functions for the OpCodeClass corresponding to the OpCode. Basically, that's all it could do, since different DXIL operations use the same function declaration when they share the same OpCodeClass.
I think the best way is to get the OpCode using: hlsl::OP::GetDxilOpFuncCallInst(CI) == hlsl::OP::OpCode::RawBufferVectorLoad (this avoids a redundant IsDxilOpFuncCallInst check on NDEBUG at least).
I looked at other code using this function and basically nobody is checking the OpCode of the CallInst users. I'm considering filing an issue about this, but I don't think there are any real bugs caused by this at the moment. Basically, you're safe if there's a 1:1 mapping between OpCode and OpCodeClass, which happens to be true in all use cases. If not, since this always lists all the overloads for the OpCodeClass that corresponds to the OpCode passed in, you could be iterating CallInst users that aren't calling with the same OpCode you're looking for. I found one use of GetOpFuncList that looks like it could potentially have this problem in OP::IsDxilOpUsed, but this is only called with OpCodes that are safe, so I don't think it doesn't cause any real bugs in practice (yet).
I think it would probably be better to make GetOpFuncList take the OpCodeClass instead of the OpCode. Then you're forced to HlslOp->GetOpFuncList(hlsl::OP::GetOpCodeClass(MyOpCode)) and it's more obvious that you need to check the OpCode for each CallInst user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding using hlsl::OP::GetDxilOpFuncCallInst, I guess hlsl::OP::getOpCode is just fine actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. One correctness issue with the alignment for the load op, and a few minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the feedback! I'll respond to most of it in the next commit. The vec1 test will have its own commit as it'll take just a bit longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I have a large block of code to commit that is interrelated, but separable and I'm unsure how best to economize reviewer time in the perennial conflict between doing a lot of reviews versus one big one, I try to have the best of both by dividing the separable portions into sensible commits and putting them in one PR. That's what I did here as I explained in this comment: #7292 (comment)
I've seen evidence that these comments are often missed though. I wonder if putting an inline comment at the top of the change would be more effective.
The individual PR you describe is essentially this commit: 27f0e73
A lot of this is tests and this change unblocked a lot of testing and it was always going to open the floodgates of tests for the combination of elements it enabled.
| return "DXIL scalarize vector load/stores"; | ||
| } | ||
|
|
||
| bool runOnModule(Module &M) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split the load and store scalarizations into two non-method static functions.
| bool Changed = false; | ||
|
|
||
| hlsl::OP *HlslOP = DM.GetOP(); | ||
| for (auto F = M.functions().begin(), E = M.functions().end(); F != E;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I did originally. I ran into a lot of problems with it because GetOpFuncList was returning a list with duplicate entries for the same function leading to double deletion. I have a note to investigate that further, but in the meantime, rather than create a list of functions to delete that I iterate through again at the end to just delete them all, I opted to use the same iteration method that the cousin pass TranslateRawBuffer uses. I don't feel strongly about it though. I could probably do as the code you cite does and rely on later passes to eliminate the unused functions for me. I don't have an issue with that.
| if (OpClass == DXIL::OpCodeClass::RawBufferVectorLoad) { | ||
| for (auto U = Func->user_begin(), UE = Func->user_end(); U != UE;) { | ||
| CallInst *CI = cast<CallInst>(*(U++)); | ||
| scalarizeVectorLoad(HlslOP, M.getDataLayout(), CI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this wouldn't be required if I use the GetOpFuncList approach?
| RWByteAddressBuffer RwByBuf : register(u1); | ||
|
|
||
| StructuredBuffer< vector<TYPE, NUM> > RoStBuf : register(t2); | ||
| RWStructuredBuffer< vector<TYPE, NUM> > RwStBuf : register(u2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's the unintended double space after the > that bothers you or the unnecessary, but I think helpful though inconsistently applied leading space before vector.
| @@ -0,0 +1,708 @@ | |||
| // RUN: %dxc -HV 2018 -T cs_6_9 -DTYPE=float -DNUM=2 %s | FileCheck %s --check-prefixes=CHECK,NODBL,NOINT | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good suggestion as they should be treated like scalars. Given how different the output will be, I'd prefer to do it elsewhere. I had intended to do the same thing to the operators-vec1s test as this though originally I'd hoped they could remain part of the same tests, but the reorganization just got too involved and I've been critiqued for how complicated I make my CHECK lines in such cases.
tools/clang/test/CodeGenDXIL/passes/longvec-load-stores-scalarizevecldst.ll
Outdated
Show resolved
Hide resolved
| for (auto F = M.functions().begin(), E = M.functions().end(); F != E;) { | ||
| Function *Func = &*(F++); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, and their version of filecheck.
Separate out the load store lowering functions from the pass class in vector scalarization pass Calculate load size on scalar element rather than resret element type to get correct alignment on scalars instead of on the whole vector. Adjust various tests accordingly that depended on or used code generated with the incorrect alignment, which always just hit the max of 8. Change iteration through matching intrinsics in link scalarization pass Capitalize varibles according to coding standards Include and remove whitespace in tests.
Add cs test for vec1s. Reformat other operator tests to be more consistent.
validate native vector loads and stores for properly defined parameters of the correct type. Add tests for both vector load/stores and the original scalar load/stores since they share a lot of validation code.
| } | ||
| LLVM_FALLTHROUGH; | ||
| case DXIL::OpCode::RawBufferVectorStore: { | ||
| Value *handle = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Value *handle = | |
| Value *Handle = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is extremely consistent. All but a few exceptions are not capitalized. I think the golden rule applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really?
- m_
- Member PascalCase not m_
- Double Capital and camelCase
- On the next line camelCase and PascalCase
That's just up to line 101.
| if (resClass != DXIL::ResourceClass::UAV) | ||
| ValCtx.EmitInstrError(CI, ValidationRule::InstrResourceClassForUAVStore); | ||
|
|
||
| unsigned alignIdx = DXIL::OperandIndex::kRawBufferStoreAlignmentOpIdx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| unsigned alignIdx = DXIL::OperandIndex::kRawBufferStoreAlignmentOpIdx; | |
| unsigned AlignIdx = DXIL::OperandIndex::kRawBufferStoreAlignmentOpIdx; |
| unsigned alignIdx = DXIL::OperandIndex::kRawBufferStoreAlignmentOpIdx; | ||
| if (DXIL::OpCode::RawBufferVectorStore == opcode) { | ||
| alignIdx = DXIL::OperandIndex::kRawBufferVectorStoreAlignmentOpIdx; | ||
| unsigned valueIx = DXIL::OperandIndex::kRawBufferVectorStoreValOpIdx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| unsigned valueIx = DXIL::OperandIndex::kRawBufferVectorStoreValOpIdx; | |
| unsigned ValueIx = DXIL::OperandIndex::kRawBufferVectorStoreValOpIdx; |
| DxilInst_RawBufferLoad bufLd(CI); | ||
| LLVM_FALLTHROUGH; | ||
| case DXIL::OpCode::RawBufferVectorLoad: { | ||
| Value *handle = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Value *handle = | |
| Value *Handle = |
| } else { | ||
| alignSize = bufLd.get_alignment_val(); | ||
| } | ||
| unsigned alignIdx = DXIL::OperandIndex::kRawBufferLoadAlignmentOpIdx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| unsigned alignIdx = DXIL::OperandIndex::kRawBufferLoadAlignmentOpIdx; | |
| unsigned AlignIdx = DXIL::OperandIndex::kRawBufferLoadAlignmentOpIdx; |
| if (!isa<ConstantInt>(CI->getOperand(alignIdx))) | ||
| ValCtx.EmitInstrError(CI, ValidationRule::InstrConstAlignForRawBuf); | ||
|
|
||
| Value *offset = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Value *offset = | |
| Value *Offset = |
| if (!isa<ConstantInt>(CI->getOperand(alignIdx))) | ||
| ValCtx.EmitInstrError(CI, ValidationRule::InstrConstAlignForRawBuf); | ||
|
|
||
| Value *offset = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Value *offset = | |
| Value *Offset = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are solutions for doing stacked PRs (https://github.com/modular/stack-pr, https://graphite.dev/blog/stacked-prs, and others). The GitHub UI for "commits" is not a good tool for reviewing completely separable changes, and relying on someone seeing a comment in the UI is lossy at best.
New passes are always separable because by LLVM's design they can be run with IR inputs and outputs and be completely independent of other changes. They make a great boundary for separating pull requests.
|
I've taken the shortest path and because I really don't want to go over this again for this same file at least, I've done this #7307 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Just a minor comment about RUN lines in hlsl files under DXILValidation where these are not run as tests. I know other files have similar issues, but it's been misleading. Sometimes, other contributors will place tests here, thinking that it's the right location for validation shell tests, and they are simply ignored.
|
Well, I wanted to see if resolving conflicts online would still require a new review and it does |
Some of these didn't come out quite right either from the manual or automatic merge.
|
It seems like this may have broken our ability to roll the shader compiler into Dawn for Chromium (WebGPU). Attempting to update to ToT with this change included starts producing the following compiler errors similar to the following for us: Given that it can't resolve |
The fix was on our end: we had to add the new .cpp file to our BUILD.gn file. |
Enables the declaration of long vector types for raw buffers, the lowering of those
and traditional vectors in loads and stores maintaining the native types with
new dxil ops along with validation and testing support of the same.
Allow declaring long vector rawbuffer resources.
Previously disallowed along with other global types, this provides a
mechanism for indicating which buffers are raw and allowing them to
contain long vectors, continuing to produce an error for other resource
types verified by existing tests
Introduce native vector DXIL load/store intrinsics.
Add new raw buffer vector load/store intrinsics using the new vector overload types.
Include them in validation associated with similar load/stores
Lower native vector raw buffers load/stores into new ops.
When the loaded/stored type is a vector of more than 1 element, the
shader model is 6.9 or higher, and the operation is on a raw buffer,
enable the generation of a native vector raw buffer load or store.
Incidental removal of unused parameter in load translation and some refactoring
of the lowering to flow better with the new resret types.
add validation and compute shader tests
Vector to scalar raw buffer load lowering pass
Native vector loads and stores are generated for 6.9 targets and above.
This includes the 6.x target used when compiling to libraries. This adds
a pass run when linking that will lower the vector operations to scalar
operations for shader models that don't have native vector support. This
allows libraries compiled for supportive shader models to be linked to
targets without support.