-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[mlir][vector] Tighten the semantics of vector.{load|store} #135151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][vector] Tighten the semantics of vector.{load|store} #135151
Conversation
This change refines the verifier for `vector.load` and `vector.store` to disallow the use of vectors with higher rank than the source or destination memref. For example, the following is now rejected: ```mlir %0 = vector.load %src[%c0] : memref<?xi8>, vector<16x16xi8> vector.store %vec, %dest[%c0] : memref<?xi8>, vector<16x16xi8> ``` This pattern was previously used in SME end-to-end tests and "happened" to work by implicitly assuming row-major memory layout. However, there is no guarantee that such an assumption will always hold, and we should avoid relying on it unless it can be enforced deterministically. Notably, production ArmSME lowering pipelines do not rely on this behavior. Instead, the expected usage (illustrated here with scalable vector syntax) would be: ```mlir %0 = vector.load %src[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8> ``` This PR updates the verifier accordingly and adjusts all affected tests. These tests are either removed (if no longer relevant) or updated to use memrefs with appropriately matching rank.
@llvm/pr-subscribers-mlir-sme @llvm/pr-subscribers-mlir Author: Andrzej Warzyński (banach-space) ChangesThis change refines the verifier for %0 = vector.load %src[%c0] : memref<?xi8>, vector<16x16xi8>
vector.store %vec, %dest[%c0] : memref<?xi8>, vector<16x16xi8> This pattern was previously used in SME end-to-end tests and "happened" Notably, production ArmSME lowering pipelines do not rely on this %0 = vector.load %src[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8> This PR updates the verifier accordingly and adjusts all affected tests. Patch is 20.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/135151.diff 7 Files Affected:
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index 98d98f067de14..8b70a6b60a1ec 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -5099,6 +5099,10 @@ LogicalResult vector::LoadOp::verify() {
if (failed(verifyLoadStoreMemRefLayout(*this, resVecTy, memRefTy)))
return failure();
+ if (memRefTy.getRank() < resVecTy.getRank())
+ return emitOpError(
+ "destination memref has lower rank than the result vector");
+
// Checks for vector memrefs.
Type memElemTy = memRefTy.getElementType();
if (auto memVecTy = llvm::dyn_cast<VectorType>(memElemTy)) {
@@ -5131,6 +5135,9 @@ LogicalResult vector::StoreOp::verify() {
if (failed(verifyLoadStoreMemRefLayout(*this, valueVecTy, memRefTy)))
return failure();
+ if (memRefTy.getRank() < valueVecTy.getRank())
+ return emitOpError("source memref has lower rank than the vector to store");
+
// Checks for vector memrefs.
Type memElemTy = memRefTy.getElementType();
if (auto memVecTy = llvm::dyn_cast<VectorType>(memElemTy)) {
diff --git a/mlir/test/Conversion/VectorToArmSME/vector-to-arm-sme.mlir b/mlir/test/Conversion/VectorToArmSME/vector-to-arm-sme.mlir
index 0f973af799634..c8a434bb8f5de 100644
--- a/mlir/test/Conversion/VectorToArmSME/vector-to-arm-sme.mlir
+++ b/mlir/test/Conversion/VectorToArmSME/vector-to-arm-sme.mlir
@@ -718,18 +718,6 @@ func.func @vector_load_i8_with_offset(%arg0 : memref<?x?xi8>) -> vector<[16]x[16
// -----
-// CHECK-LABEL: @vector_load_i8_from_rank_1_memref(
-// CHECK-SAME: %[[MEMREF:.*]]: memref<?xi8>)
-// CHECK: %[[C0:.*]] = arith.constant 0 : index
-// CHECK: arm_sme.tile_load %[[MEMREF]][%[[C0]]] : memref<?xi8>, vector<[16]x[16]xi8>
-func.func @vector_load_i8_from_rank_1_memref(%arg0 : memref<?xi8>) -> vector<[16]x[16]xi8> {
- %c0 = arith.constant 0 : index
- %tile = vector.load %arg0[%c0] : memref<?xi8>, vector<[16]x[16]xi8>
- return %tile : vector<[16]x[16]xi8>
-}
-
-// -----
-
// CHECK-LABEL: @vector_load_i16(
// CHECK: arm_sme.tile_load {{.*}} : memref<?x?xi16>, vector<[8]x[8]xi16>
func.func @vector_load_i16(%arg0 : memref<?x?xi16>) -> vector<[8]x[8]xi16> {
diff --git a/mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir b/mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir
index 067cdb5c5fd20..3160fd9c65c04 100644
--- a/mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir
+++ b/mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir
@@ -819,18 +819,29 @@ func.func @test_ldmatrix(%arg0: memref<4x32x32xf16, 3>, %arg1: index, %arg2: ind
// -----
-func.func @fold_vector_load_subview(
- %arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index) -> vector<12x32xf32> {
- %0 = memref.subview %arg0[%arg1, %arg2][1, 1][1, 1] : memref<12x32xf32> to memref<f32, strided<[], offset: ?>>
- %1 = vector.load %0[] : memref<f32, strided<[], offset: ?>>, vector<12x32xf32>
- return %1 : vector<12x32xf32>
+func.func @fold_vector_load_subview(%src : memref<24x64xf32>,
+ %off1 : index,
+ %off2 : index,
+ %dim1 : index,
+ %dim2 : index,
+ %idx : index) -> vector<12x32xf32> {
+
+ %0 = memref.subview %src[%off1, %off2][%dim1, %dim2][1, 1] : memref<24x64xf32> to memref<?x?xf32, strided<[64, 1], offset: ?>>
+ %1 = vector.load %0[%idx, %idx] : memref<?x?xf32, strided<[64, 1], offset: ?>>, vector<12x32xf32>
+ return %1 : vector<12x32xf32>
}
-// CHECK: func @fold_vector_load_subview
-// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<12x32xf32>
-// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: index
-// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: index
-// CHECK: vector.load %[[ARG0]][%[[ARG1]], %[[ARG2]]] : memref<12x32xf32>, vector<12x32xf32>
+// CHECK: #[[$ATTR_46:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
+// CHECK-LABEL: func.func @fold_vector_load_subview(
+// CHECK-SAME: %[[SRC:[a-zA-Z0-9$._-]*]]: memref<24x64xf32>,
+// CHECK-SAME: %[[OFF_1:[a-zA-Z0-9$._-]*]]: index,
+// CHECK-SAME: %[[OFF_2:[a-zA-Z0-9$._-]*]]: index,
+// CHECK-SAME: %[[DIM_1:[a-zA-Z0-9$._-]*]]: index,
+// CHECK-SAME: %[[DIM_2:[a-zA-Z0-9$._-]*]]: index,
+// CHECK-SAME: %[[IDX:[a-zA-Z0-9$._-]*]]: index) -> vector<12x32xf32> {
+// CHECK: %[[VAL_6:.*]] = affine.apply #[[$ATTR_46]](){{\[}}%[[OFF_1]], %[[IDX]]]
+// CHECK: %[[VAL_7:.*]] = affine.apply #[[$ATTR_46]](){{\[}}%[[OFF_2]], %[[IDX]]]
+// CHECK: %[[VAL_8:.*]] = vector.load %[[SRC]]{{\[}}%[[VAL_6]], %[[VAL_7]]] : memref<24x64xf32>, vector<12x32xf32>
// -----
@@ -851,20 +862,32 @@ func.func @fold_vector_maskedload_subview(
// -----
-func.func @fold_vector_store_subview(
- %arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3: vector<2x32xf32>) -> () {
- %0 = memref.subview %arg0[%arg1, %arg2][1, 1][1, 1] : memref<12x32xf32> to memref<f32, strided<[], offset: ?>>
- vector.store %arg3, %0[] : memref<f32, strided<[], offset: ?>>, vector<2x32xf32>
- return
+func.func @fold_vector_store_subview(%src : memref<24x64xf32>,
+ %off1 : index,
+ %off2 : index,
+ %vec: vector<2x32xf32>,
+ %idx : index,
+ %dim1 : index,
+ %dim2 : index) -> () {
+
+ %0 = memref.subview %src[%off1, %off2][%dim1, %dim2][1, 1] : memref<24x64xf32> to memref<?x?xf32, strided<[64, 1], offset: ?>>
+ vector.store %vec, %0[%idx, %idx] : memref<?x?xf32, strided<[64, 1], offset: ?>> , vector<2x32xf32>
+ return
}
-// CHECK: func @fold_vector_store_subview
-// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<12x32xf32>
-// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: index
-// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: index
-// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: vector<2x32xf32>
-// CHECK: vector.store %[[ARG3]], %[[ARG0]][%[[ARG1]], %[[ARG2]]] : memref<12x32xf32>, vector<2x32xf32>
-// CHECK: return
+// CHECK: #[[$ATTR_47:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
+
+// CHECK-LABEL: func.func @fold_vector_store_subview(
+// CHECK-SAME: %[[SRC:[a-zA-Z0-9$._-]*]]: memref<24x64xf32>,
+// CHECK-SAME: %[[OFF1:[a-zA-Z0-9$._-]*]]: index,
+// CHECK-SAME: %[[OFF_2:[a-zA-Z0-9$._-]*]]: index,
+// CHECK-SAME: %[[VEC:[a-zA-Z0-9$._-]*]]: vector<2x32xf32>,
+// CHECK-SAME: %[[IDX:[a-zA-Z0-9$._-]*]]: index,
+// CHECK-SAME: %[[VAL_5:[a-zA-Z0-9$._-]*]]: index,
+// CHECK-SAME: %[[VAL_6:[a-zA-Z0-9$._-]*]]: index) {
+// CHECK: %[[VAL_7:.*]] = affine.apply #[[$ATTR_47]](){{\[}}%[[OFF1]], %[[IDX]]]
+// CHECK: %[[VAL_8:.*]] = affine.apply #[[$ATTR_47]](){{\[}}%[[OFF_2]], %[[IDX]]]
+// CHECK: vector.store %[[VEC]], %[[SRC]]{{\[}}%[[VAL_7]], %[[VAL_8]]] : memref<24x64xf32>, vector<2x32xf32>
// -----
diff --git a/mlir/test/Dialect/Vector/invalid.mlir b/mlir/test/Dialect/Vector/invalid.mlir
index ea6d0021391fb..f7192fbf68b4e 100644
--- a/mlir/test/Dialect/Vector/invalid.mlir
+++ b/mlir/test/Dialect/Vector/invalid.mlir
@@ -1743,13 +1743,11 @@ func.func @invalid_outerproduct(%src : memref<?xf32>) {
// -----
-func.func @invalid_outerproduct1(%src : memref<?xf32>) {
+func.func @invalid_outerproduct1(%src : memref<?xf32>, %lhs : vector<[4]x[4]xf32>, %rhs : vector<[4]xf32>) {
%idx = arith.constant 0 : index
- %0 = vector.load %src[%idx] : memref<?xf32>, vector<[4]x[4]xf32>
- %1 = vector.load %src[%idx] : memref<?xf32>, vector<[4]xf32>
// expected-error @+1 {{'vector.outerproduct' op expected 1-d vector for operand #1}}
- %op = vector.outerproduct %0, %1 : vector<[4]x[4]xf32>, vector<[4]xf32>
+ %op = vector.outerproduct %lhs, %rhs : vector<[4]x[4]xf32>, vector<[4]xf32>
}
// -----
@@ -1870,3 +1868,29 @@ func.func @flat_transpose_scalable(%arg0: vector<[16]xf32>) -> vector<[16]xf32>
: vector<[16]xf32> -> vector<[16]xf32>
return %0 : vector<[16]xf32>
}
+
+// -----
+
+//===----------------------------------------------------------------------===//
+// vector.load
+//===----------------------------------------------------------------------===//
+
+func.func @vector_load(%src : memref<?xi8>) {
+ %c0 = arith.constant 0 : index
+ // expected-error @+1 {{'vector.load' op destination memref has lower rank than the result vector}}
+ %0 = vector.load %src[%c0] : memref<?xi8>, vector<16x16xi8>
+ return
+}
+
+// -----
+
+//===----------------------------------------------------------------------===//
+// vector.store
+//===----------------------------------------------------------------------===//
+
+func.func @vector_store(%dest : memref<?xi8>, %vec : vector<16x16xi8>) {
+ %c0 = arith.constant 0 : index
+ // expected-error @+1 {{'vector.store' op source memref has lower rank than the vector to store}}
+ vector.store %vec, %dest[%c0] : memref<?xi8>, vector<16x16xi8>
+ return
+}
diff --git a/mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir b/mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir
index fd50acf03e79b..511ab70f35086 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir
@@ -2,8 +2,8 @@
// CHECK-LABEL: func @vector_transfer_ops_0d_memref(
// CHECK-SAME: %[[MEM:.*]]: memref<f32>
-// CHECK-SAME: %[[VEC:.*]]: vector<1x1x1xf32>
-func.func @vector_transfer_ops_0d_memref(%mem: memref<f32>, %vec: vector<1x1x1xf32>) {
+// CHECK-SAME: %[[VEC:.*]]: vector<f32>
+func.func @vector_transfer_ops_0d_memref(%mem: memref<f32>, %vec: vector<f32>) {
%f0 = arith.constant 0.0 : f32
// CHECK-NEXT: %[[S:.*]] = vector.load %[[MEM]][] : memref<f32>, vector<f32>
@@ -12,8 +12,8 @@ func.func @vector_transfer_ops_0d_memref(%mem: memref<f32>, %vec: vector<1x1x1xf
// CHECK-NEXT: vector.store %[[S]], %[[MEM]][] : memref<f32>, vector<f32>
vector.transfer_write %0, %mem[] : vector<f32>, memref<f32>
-// CHECK-NEXT: vector.store %[[VEC]], %[[MEM]][] : memref<f32>, vector<1x1x1xf32>
- vector.store %vec, %mem[] : memref<f32>, vector<1x1x1xf32>
+// CHECK-NEXT: vector.store %[[VEC]], %[[MEM]][] : memref<f32>, vector<f32>
+ vector.store %vec, %mem[] : memref<f32>, vector<f32>
return
}
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/transpose.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/transpose.mlir
index ff20f99b63cd1..b44658eef4e11 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/transpose.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/transpose.mlir
@@ -17,7 +17,7 @@ func.func @entry() {
%za_s_size = arith.muli %svl_s, %svl_s : index
// Allocate memory.
- %mem1 = memref.alloca(%za_s_size) : memref<?xi32>
+ %mem1 = memref.alloca(%za_s_size, %svl_s) : memref<?x?xi32>
// Fill each "row" of "mem1" with row number.
//
@@ -29,15 +29,15 @@ func.func @entry() {
// 3, 3, 3, 3
//
%init_0 = arith.constant 0 : i32
- scf.for %i = %c0 to %za_s_size step %svl_s iter_args(%val = %init_0) -> (i32) {
+ scf.for %i = %c0 to %svl_s step %c1 iter_args(%val = %init_0) -> (i32) {
%splat_val = vector.broadcast %val : i32 to vector<[4]xi32>
- vector.store %splat_val, %mem1[%i] : memref<?xi32>, vector<[4]xi32>
+ vector.store %splat_val, %mem1[%i, %c0] : memref<?x?xi32>, vector<[4]xi32>
%val_next = arith.addi %val, %c1_i32 : i32
scf.yield %val_next : i32
}
// Load tile from "mem1".
- %tile = vector.load %mem1[%c0] : memref<?xi32>, vector<[4]x[4]xi32>
+ %tile = vector.load %mem1[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
// Transpose tile.
%transposed_tile = vector.transpose %tile, [1, 0] : vector<[4]x[4]xi32> to vector<[4]x[4]xi32>
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
index 6e25bee65f095..09d68661c6e9d 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
@@ -34,11 +34,11 @@ func.func @za0_d_f64() -> i32 {
// 3.1, 3.1, 3.1, 3.1
//
%tilesize = arith.muli %svl_d, %svl_d : index
- %mem1 = memref.alloca(%tilesize) : memref<?xf64>
+ %mem1 = memref.alloca(%svl_d, %svl_d) : memref<?x?xf64>
%init_0 = arith.constant 0.1 : f64
- scf.for %i = %c0 to %tilesize step %svl_d iter_args(%val = %init_0) -> (f64) {
+ scf.for %i = %c0 to %svl_d step %c1_index iter_args(%val = %init_0) -> (f64) {
%splat_val = vector.broadcast %val : f64 to vector<[2]xf64>
- vector.store %splat_val, %mem1[%i] : memref<?xf64>, vector<[2]xf64>
+ vector.store %splat_val, %mem1[%i, %c0] : memref<?x?xf64>, vector<[2]xf64>
%val_next = arith.addf %val, %c1_f64 : f64
scf.yield %val_next : f64
}
@@ -48,27 +48,29 @@ func.func @za0_d_f64() -> i32 {
//
// CHECK-ZA0_D: ( 0.1, 0.1
// CHECK-ZA0_D-NEXT: ( 1.1, 1.1
- scf.for %i = %c0 to %tilesize step %svl_d {
- %tileslice = vector.load %mem1[%i] : memref<?xf64>, vector<[2]xf64>
+ scf.for %i = %c0 to %svl_d step %c1_index {
+ %tileslice = vector.load %mem1[%i, %c0] : memref<?x?xf64>, vector<[2]xf64>
vector.print %tileslice : vector<[2]xf64>
}
// Load ZA0.D from "mem1"
- %za0_d = vector.load %mem1[%c0] : memref<?xf64>, vector<[2]x[2]xf64>
+ %za0_d = vector.load %mem1[%c0, %c0] : memref<?x?xf64>, vector<[2]x[2]xf64>
// Allocate "mem2" to store ZA0.D to
- %mem2 = memref.alloca(%tilesize) : memref<?xf64>
+ %mem2 = memref.alloca(%svl_d, %svl_d) : memref<?x?xf64>
// Zero "mem2"
- scf.for %i = %c0 to %tilesize step %c1_index {
- memref.store %c0_f64, %mem2[%i] : memref<?xf64>
+ scf.for %i = %c0 to %svl_d step %c1_index {
+ scf.for %j = %c0 to %svl_d step %c1_index {
+ memref.store %c0_f64, %mem2[%i, %j] : memref<?x?xf64>
+ }
}
// Verify "mem2" is zeroed by doing an add reduction with initial value of
// zero
%init_0_f64 = arith.constant 0.0 : f64
- %add_reduce = scf.for %vnum = %c0 to %tilesize step %svl_d iter_args(%iter = %init_0_f64) -> (f64) {
- %row = vector.load %mem2[%vnum] : memref<?xf64>, vector<[2]xf64>
+ %add_reduce = scf.for %vnum = %c0 to %svl_d step %c1_index iter_args(%iter = %init_0_f64) -> (f64) {
+ %row = vector.load %mem2[%vnum, %c0] : memref<?x?xf64>, vector<[2]xf64>
%inner_add_reduce = scf.for %offset = %c0 to %svl_d step %c1_index iter_args(%inner_iter = %init_0_f64) -> (f64) {
%t = vector.extractelement %row[%offset : index] : vector<[2]xf64>
@@ -88,16 +90,16 @@ func.func @za0_d_f64() -> i32 {
//
// CHECK-ZA0_D-NEXT: ( 0, 0
// CHECK-ZA0_D-NEXT: ( 0, 0
- scf.for %i = %c0 to %tilesize step %svl_d {
- %tileslice = vector.load %mem2[%i] : memref<?xf64>, vector<[2]xf64>
+ scf.for %i = %c0 to %svl_d step %c1_index{
+ %tileslice = vector.load %mem2[%i, %c0] : memref<?x?xf64>, vector<[2]xf64>
vector.print %tileslice : vector<[2]xf64>
}
// Verify "mem1" != "mem2"
%init_1 = arith.constant 1 : i64
- %mul_reduce_0 = scf.for %vnum = %c0 to %tilesize step %svl_d iter_args(%iter = %init_1) -> (i64) {
- %row_1 = vector.load %mem1[%vnum] : memref<?xf64>, vector<[2]xf64>
- %row_2 = vector.load %mem2[%vnum] : memref<?xf64>, vector<[2]xf64>
+ %mul_reduce_0 = scf.for %vnum = %c0 to %svl_d step %c1_index iter_args(%iter = %init_1) -> (i64) {
+ %row_1 = vector.load %mem1[%vnum, %c0] : memref<?x?xf64>, vector<[2]xf64>
+ %row_2 = vector.load %mem2[%vnum, %c0] : memref<?x?xf64>, vector<[2]xf64>
%cmp = arith.cmpf one, %row_1, %row_2 : vector<[2]xf64>
%inner_mul_reduce = scf.for %i = %c0 to %svl_d step %c1_index iter_args(%inner_iter = %init_1) -> (i64) {
@@ -115,12 +117,12 @@ func.func @za0_d_f64() -> i32 {
vector.print %mul_reduce_0 : i64
// Store ZA0.D to "mem2"
- vector.store %za0_d, %mem2[%c0] : memref<?xf64>, vector<[2]x[2]xf64>
+ vector.store %za0_d, %mem2[%c0, %c0] : memref<?x?xf64>, vector<[2]x[2]xf64>
// Verify "mem1" == "mem2"
- %mul_reduce_1 = scf.for %vnum = %c0 to %tilesize step %svl_d iter_args(%iter = %init_1) -> (i64) {
- %row_1 = vector.load %mem1[%vnum] : memref<?xf64>, vector<[2]xf64>
- %row_2 = vector.load %mem2[%vnum] : memref<?xf64>, vector<[2]xf64>
+ %mul_reduce_1 = scf.for %vnum = %c0 to %svl_d step %c1_index iter_args(%iter = %init_1) -> (i64) {
+ %row_1 = vector.load %mem1[%vnum, %c0] : memref<?x?xf64>, vector<[2]xf64>
+ %row_2 = vector.load %mem2[%vnum, %c0] : memref<?x?xf64>, vector<[2]xf64>
%cmp = arith.cmpf oeq, %row_1, %row_2 : vector<[2]xf64>
%inner_mul_reduce = scf.for %i = %c0 to %svl_d step %c1_index iter_args(%inner_iter = %init_1) -> (i64) {
@@ -142,8 +144,8 @@ func.func @za0_d_f64() -> i32 {
//
// CHECK-ZA0_D-NEXT: ( 0.1, 0.1
// CHECK-ZA0_D-NEXT: ( 1.1, 1.1
- scf.for %i = %c0 to %tilesize step %svl_d {
- %tileslice = vector.load %mem2[%i] : memref<?xf64>, vector<[2]xf64>
+ scf.for %i = %c0 to %svl_d step %c1_index{
+ %tileslice = vector.load %mem2[%i, %c0] : memref<?x?xf64>, vector<[2]xf64>
vector.print %tileslice : vector<[2]xf64>
}
@@ -170,8 +172,9 @@ func.func @load_store_two_za_s_tiles() -> i32 {
// Allocate memory for two 32-bit element tiles.
%size_of_tile = arith.muli %svl_s, %svl_s : index
+ %svl_s_x_2 = arith.muli %svl_s, %c2_index : index
%size_of_two_tiles = arith.muli %size_of_tile, %c2_index : index
- %mem1 = memref.alloca(%size_of_two_tiles) : memref<?xi32>
+ %mem1 = memref.alloca(%svl_s_x_2, %svl_s) : memref<?x?xi32>
// Fill memory that tile 1 will be loaded from with '1' and '2' for tile 2.
//
@@ -191,15 +194,15 @@ func.func @load_store_two_za_s_tiles() -> i32 {
// 2, 2, 2, 2
// 2, 2, 2, 2
//
- scf.for %i = %c0 to %size_of_two_tiles step %svl_s {
- %isFirstTile = arith.cmpi ult, %i, %size_of_tile : index
+ scf.for %i = %c0 to %svl_s_x_2 step %c1_index {
+ %isFirstTile = arith.cmpi ult, %i, %svl_s : index
%val = scf.if %isFirstTile -> i32 {
scf.yield %c1_i32 : i32
} else {
scf.yield %c2_i32 : i32
}
%splat_val = vector.broadcast %val : i32 to vector<[4]xi32>
- vector.store %splat_val, %mem1[%i] : memref<?xi32>, vector<[4]xi32>
+ vector.store %splat_val, %mem1[%i, %c0] : memref<?x?xi32>, vector<[4]xi32>
}
// Dump "mem1". The smallest SVL is 128-bits so each tile will be at least
@@ -213,32 +216,32 @@ func.func @load_store_two_za_s_tiles() -> i32 {
// CHECK-NEXT: ( 2, 2, 2, 2
// CHECK-NEXT: ( 2, 2, 2, 2
// CHECK-NEXT: ( 2, 2, 2, 2
- scf.for %i = %c0 to %size_of_two_tiles step %svl_s {
- %tileslice = vector.load %mem1[%i] : memref<?xi32>, vector<[4]xi32>
+ scf.for %i = %c0 to %svl_s_x_2 step %c1_index {
+ %tileslice = vector.load %mem1[%i, %c0] : memref<?x?xi32>, vector<[4]xi32>
vector.print %tileslice : vector<[4]xi32>
}
// Load tile 1 from memory
- %za0_s = vector.load %mem1[%c0] : memref<?xi32>, vector<[4]x[4]xi32>
+ %za0_s = vector.load %mem1[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
// Load tile 2 from memory
- %za1_s = vector.load %mem1[%size_of_tile] : memref<?xi32>, vector<[4]x[4]xi32>
+ %za1_s = vector.load %mem1[%svl_s, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
// Allocate new memory to store tiles to
- %mem2 = memref.alloca(%size_of_two_tiles) : memref<?xi32>
+ %mem2 = memref.alloca(%svl_s_x_2, %svl_s) : memref<?x?xi32>
// Zero new memory
- scf.for %i = %c0 to %size_of_two_tiles step %c1_index {
- memref.store %c0_i32, %mem2[%i] : memref<?xi32>
+ scf.for %i = %c0 to %svl_s_x_2 step %c1_index {
+ memref.store %c0_i32, %mem2[%i, %c0] : memref<?x?xi32>
}
// Stores tiles back to (new) memory in reverse order
// Store tile 2 to memory
- vector.store %za1_s, %mem2[%c0] : memref<?xi32>, vector<[4]x[4]xi32>
+ vector.store %za1_s, %mem2[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
// Store tile 1 to memory
- vector.store %za0_s, %mem2[%size_of_tile] : memref<?xi32>, vector<[4]x[4]xi32>
+ vector.store %za0_s, %mem2[%svl_s, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
// Dump "mem2" and check the tiles were stored in reverse order. The smallest
// SV...
[truncated]
|
// CHECK-LABEL: @vector_load_i8_from_rank_1_memref( | ||
// CHECK-SAME: %[[MEMREF:.*]]: memref<?xi8>) | ||
// CHECK: %[[C0:.*]] = arith.constant 0 : index | ||
// CHECK: arm_sme.tile_load %[[MEMREF]][%[[C0]]] : memref<?xi8>, vector<[16]x[16]xi8> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could follow up to apply the same changes to ArmSME. There's some unused code ArmSMEToSCF.cpp
for dealing with rank 1 memrefs (e.g. in getMemrefIndices
) and it somewhat broken: #118769.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just spotted a few little things:
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
Outdated
Show resolved
Hide resolved
Address comments from Ben
) This change refines the verifier for `vector.load` and `vector.store` to disallow the use of vectors with higher rank than the source or destination memref. For example, the following is now rejected: ```mlir %0 = vector.load %src[%c0] : memref<?xi8>, vector<16x16xi8> vector.store %vec, %dest[%c0] : memref<?xi8>, vector<16x16xi8> ``` This pattern was previously used in SME end-to-end tests and "happened" to work by implicitly assuming row-major memory layout. However, there is no guarantee that such an assumption will always hold, and we should avoid relying on it unless it can be enforced deterministically. Notably, production ArmSME lowering pipelines do not rely on this behavior. Instead, the expected usage (illustrated here with scalable vector syntax) would be: ```mlir %0 = vector.load %src[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8> ``` This PR updates the verifier accordingly and adjusts all affected tests. These tests are either removed (if no longer relevant) or updated to use memrefs with appropriately matching rank.
This patch updates the following arm_sme ops to require that input and output element types match: * `arm_sme.tile_load`, `arm_sme.tile_store`, `arm_sme.tile_load_slice`, `arm_sme.tile_store_slice`. In addition, it ensures that the base memref operand for `tile_load` and `tile_store` is always rank-2, aligning with the semantics of Arm SME tiles (always rank-2). This change is effectively a follow-up to llvm#135151: * "[mlir][vector] Tighten the semantics of vector.{load|store}" The patch also updates `createLoadStoreForOverTileSlices` in ArmSMEToSCF.cpp to fail when processing invalid tile stores like the following: ```mlir arm_sme.tile_store %arg0, %arg1[%c0] : memref<?x4xi8>, vector<[4]x[4]xi32> ``` This particular change fixes llvm#118769. As noted in the TODO, we should further extend op verification logic — I plan to address that in a follow-up patch.
This patch updates the following ArmSME ops to require that input and output element types match: * `arm_sme.tile_load`, `arm_sme.tile_store`, `arm_sme.tile_load_slice`, `arm_sme.tile_store_slice`. In addition, it ensures that the base memref operand for `tile_load` and `tile_store` is always rank-2, aligning with the semantics of Arm SME tiles (always rank-2). This change is effectively a follow-up to #135151: * "[mlir][vector] Tighten the semantics of vector.{load|store}" The patch also updates `createLoadStoreForOverTileSlices` in ArmSMEToSCF.cpp to fail when processing invalid tile stores like the following: ```mlir arm_sme.tile_store %arg0, %arg1[%c0] : memref<?x4xi8>, vector<[4]x[4]xi32> ``` This particular change fixes #118769. As noted in the TODO, we should further extend op verification logic — I plan to address that in a follow-up patch.
This change refines the verifier for
vector.load
andvector.store
todisallow the use of vectors with higher rank than the source or
destination memref. For example, the following is now rejected:
This pattern was previously used in SME end-to-end tests and "happened"
to work by implicitly assuming row-major memory layout. However, there
is no guarantee that such an assumption will always hold, and we should
avoid relying on it unless it can be enforced deterministically.
Notably, production ArmSME lowering pipelines do not rely on this
behavior. Instead, the expected usage (illustrated here with scalable
vector syntax) would be:
This PR updates the verifier accordingly and adjusts all affected tests.
These tests are either removed (if no longer relevant) or updated to use
memrefs with appropriately matching rank.