-
Notifications
You must be signed in to change notification settings - Fork 15k
[RISCV][TTI] Enable masked interleave access for scalable vector #149981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -1359,7 +1359,9 @@ class LoopVectorizationCostModel { | |||||||||||||||||
| return; | ||||||||||||||||||
| // Override EVL styles if needed. | ||||||||||||||||||
| // FIXME: Investigate opportunity for fixed vector factor. | ||||||||||||||||||
| // FIXME: Support interleave accesses. | ||||||||||||||||||
| bool EVLIsLegal = UserIC <= 1 && IsScalableVF && | ||||||||||||||||||
| !InterleaveInfo.hasGroups() && | ||||||||||||||||||
|
||||||||||||||||||
| static Value *getMask(Value *WideMask, unsigned Factor, | |
| ElementCount LeafValueEC) { | |
| if (auto *IMI = dyn_cast<IntrinsicInst>(WideMask)) { | |
| if (unsigned F = getInterleaveIntrinsicFactor(IMI->getIntrinsicID()); | |
| F && F == Factor && llvm::all_equal(IMI->args())) { | |
| return IMI->getArgOperand(0); | |
| } | |
| } |
So it should emit a vlseg with mask [T, T, T, F] on the first iteration, and [T, T, F, F] on the second iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, although I previously said the mask might be incorrect, it’s actually different from what you described.
; IF-EVL-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[EVL_BASED_IV]], i64 0
; IF-EVL-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; IF-EVL-NEXT: [[TMP10:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
; IF-EVL-NEXT: [[TMP12:%.*]] = add <vscale x 4 x i64> zeroinitializer, [[TMP10]]
; IF-EVL-NEXT: [[VEC_IV:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT2]], [[TMP12]]
; IF-EVL-NEXT: [[TMP13:%.*]] = icmp ule <vscale x 4 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]
In reality, the two masks should be: [T, T, T, T, T, T, T, T] and [T, T, T, T, F, F, F, F].
For the first iteration:
([0, 0, 0, 0] + [0, 1, 2, 3]) <= [4, 4, 4, 4], where 4 is TripCount - 1.
For the second iteration:
([3, 3, 3, 3] + [0, 1, 2, 3]) <= [4, 4, 4, 4].
Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. This looks like an underlying issue with mixing the header masks and EVL based IV.
I think we need to convert the header masks from icmp ule wide-canonical-iv, backedge-tc to icmp ult step-vector, EVL.
Because on the second-to-last iteration the original header mask isn't going to take into account the possible truncation.
So on the first iteration, it should really be:
[0, 1, 2, 3] < 3 = [T, T, T, F]
And on the second iteration
[0, 1, 2, 3] < 2 = [T, T, F, F].
I'll create an issue for this, this is a good find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; IF-EVL-NEXT: [[TMP9:%.*]] = getelementptr inbounds [2 x i32], ptr [[B:%.*]], i64 [[EVL_BASED_IV]], i32 0
But thinking about it more carefully, since the GEP has already been adjusted based on the EVL base PHI, even though the header mask is not in sync with the EVL, the final store result in the example should still be correct. It's just that addresses which were originally written to only once may now be written to multiple times. The value written the first time might be incorrect, but it will eventually be overwritten with the correct value. However, I'm not sure if this could cause other issues. In any case, I still recommend avoiding mixing tail folding modes—it doesn’t seem entirely safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that addresses which were originally written to only once may now be written to multiple times. The value written the first time might be incorrect, but it will eventually be overwritten with the correct value.
Just FYI, this reasoning is suspect. Introducing a duplicate write to the same location is possibly a violation of the memory model. I'd have to think that through carefully to see if it was, but it's definitely undesirable if we can reasonable avoid.
(Topic switch)
If I'm following this thread correctly, the issue being theorized here is a problem for any interleave group with EVL right? Not just the masked variants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replying to myself after reading more context - yeah, this is a generic EVL bug.
@Mel-Chen Would you mind separating your EVLIsLegal change into a separate review with a test case which shows the bug? I'd like to get EVL disabled for this case while Luke's correctness change works it way through review.
Edit: Once we do that, I can enable the masked interleave support (for what ends up being masking and non-predicated only), then we can remove the limitation once the root issue is properly fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this later today, and realized I'd missed the obvious here. While the bug is theoretically reachable through other paths, the combination of predicatation and interleave-groups currently only happens with this patch to enable the predication. EVL without this patch can't hit this code path; I'd missed that when thinking about this earlier.
@Mel-Chen I'm tempted to just take your patch as the primary path forward here towards enabling masking interleave. Any concerns with that? On reflection, I don't know the splitting I'd suggested earlier is actually worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I've landed the last change for the fixed vector path. You can delete UseMaskForCand entirely if you want, if not, I'll do it as a follow up review.