-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[mlir][arith] fix wrong floordivsi fold (#83079) #83248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-adt @llvm/pr-subscribers-mlir-arith Author: long.chen (lipracer) ChangesFixs #83079 Full diff: https://github.com/llvm/llvm-project/pull/83248.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Arith/IR/ArithOps.cpp b/mlir/lib/Dialect/Arith/IR/ArithOps.cpp
index 0f71c19c23b654..d370b7d04dea7e 100644
--- a/mlir/lib/Dialect/Arith/IR/ArithOps.cpp
+++ b/mlir/lib/Dialect/Arith/IR/ArithOps.cpp
@@ -709,19 +709,25 @@ OpFoldResult arith::FloorDivSIOp::fold(FoldAdaptor adaptor) {
}
if (!aGtZero && !bGtZero) {
// Both negative, return -a / -b.
- APInt posA = zero.ssub_ov(a, overflowOrDiv0);
- APInt posB = zero.ssub_ov(b, overflowOrDiv0);
- return posA.sdiv_ov(posB, overflowOrDiv0);
+ return a.sdiv_ov(b, overflowOrDiv0);
}
if (!aGtZero && bGtZero) {
// A is negative, b is positive, return - ceil(-a, b).
APInt posA = zero.ssub_ov(a, overflowOrDiv0);
+ if (overflowOrDiv0)
+ return a;
APInt ceil = signedCeilNonnegInputs(posA, b, overflowOrDiv0);
+ if (overflowOrDiv0)
+ return a;
return zero.ssub_ov(ceil, overflowOrDiv0);
}
// A is positive, b is negative, return - ceil(a, -b).
APInt posB = zero.ssub_ov(b, overflowOrDiv0);
+ if (overflowOrDiv0)
+ return a;
APInt ceil = signedCeilNonnegInputs(a, posB, overflowOrDiv0);
+ if (overflowOrDiv0)
+ return a;
return zero.ssub_ov(ceil, overflowOrDiv0);
});
diff --git a/mlir/test/Transforms/canonicalize.mlir b/mlir/test/Transforms/canonicalize.mlir
index 2cf86b50d432f6..d2c2c12d323892 100644
--- a/mlir/test/Transforms/canonicalize.mlir
+++ b/mlir/test/Transforms/canonicalize.mlir
@@ -989,6 +989,15 @@ func.func @tensor_arith.floordivsi_by_one(%arg0: tensor<4x5xi32>) -> tensor<4x5x
return %res : tensor<4x5xi32>
}
+// CHECK-LABEL: func @arith.floordivsi_by_one_overflow
+func.func @arith.floordivsi_by_one_overflow() -> i64 {
+ %neg_one = arith.constant -1 : i64
+ %min_int = arith.constant -9223372036854775808 : i64
+ // CHECK: arith.floordivsi
+ %poision = arith.floordivsi %min_int, %neg_one : i64
+ return %poision : i64
+}
+
// -----
// CHECK-LABEL: func @arith.ceildivsi_by_one
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine upon a quick scan of the patch. Given the number or error code paths, I think it could be made more readable with something like constFoldBinaryOpConditional
IIRC.
Not sure this is the right patch, please see this comment |
@kuhar There are indeed many overflow states that need to be checked here, but I haven't come up with a good method yet. Can you give me a specific suggestion? Thank you. |
Without knowing the exact details of the floordivsi algorithm, the way I'd think about it is that we only need to detect if the final overflow happened or not. And of course fold correctly when there is no overflow. If there are intermediate overflows, I'd think that they can fold into one of the two buckets: (a) those that indicate that the floordivsi overflows for these arguments, and (b) implementation bugs. It might be helpful to extract this fold code to a unit test (say one of the APInt test files) and go from there. Pick a few inputs that are known to overflow and trace the values and intermediate overflows. Maybe write a separate check to decide if overflow happens for the given input instead of relying on some intermediate checks. |
@lipracer Another idea: we could perform the division on APInt with more bits than the bitwidth of the folded integer type. Maybe this way it'd be easier to detect overflow. And then once we are confident there's no overflow, truncate it back to the desired bitwidth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lipracer could you open a separate PR for the APInt change? This will make it easier to review and land properly.
Ok. |
f231e71
to
0939c26
Compare
submit a APInt change PR. |
c241567
to
fcaa9e5
Compare
APInt change has already merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine upon a quick scan but I won't have the time to understand the details of the expansion in near future. Please get a second approval before landing.
Thank you very much for your valuable suggestion. |
@joker-eph I'd like someone to review my code, could you help me with that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change almost certainly has performance implications. Do we have some way to measure or even guess at that? If this is fixing a correctness bug, I am also not the right person to verify that, but I trust the author in this
@pingshiyu would you be able to review the correctness of this change? |
Yes, this may indeed affect performance. This implementation is based on tensorflow xla expansion. For performance considerations, I think we can have a compilation option similar to ‘fast-expand’ for expansion. This change is only consistent with the behavior of llvm. |
sure, I've just taken some time to check through the cases, lgtm re correctness. thanks @lipracer! maybe some extra tests would be nice for the edge cases, e.g. |
@pingshiyu Thanks, I have added more corner tests. |
The CI error seems unrelated to this change. If the performance drops after merging, I will submit another change to enable "fast expand". |
The premerge is green at HEAD: can you rebase? https://lab.llvm.org/buildbot/#/builders/271 |
Nevermind I misread the log, it's failing on building flang by running out of memory:
I concur that it is unrelated, and all the MLIR tests are passing. |
Thanks, I will rebase on head. |
f3bcea4
to
1217ab5
Compare
1) fix floordivsi error expand logic 2) fix floordivsi fold did't check overflow stat Fixs llvm#83079
Fixs #83079