Description
This ticket is about the in_bounds
attribute from the Vector
s vector.transfer_read + vector.transfer_write Ops.
There are at least two places in the Linalg vectoriser where the computation of this attribute is too weak:
Note that in both cases, the write/read indices are ignored when computing the attribute (you will have to open the links). This means that the following example is always inferred as "in bounds". However, that should depend on the indices:
vector.transfer_write %14, %arg8[%arg3, %arg5] {in_bounds = [true, true]} : vector<8x4xf32>, memref<8x32xf32>
Effectively, vectorizeAsInsertSliceOp
and createWriteOrMaskedWrite
assume that the indices are always 0
, but that is not always the case (e.g. in the example above).
In #141244 I am refactoring vectorizeAsInsertSliceOp
to re-use createWriteOrMaskedWrite
- this way there will be only one hook to fix. Note that this issue also affects createReadOrMaskedRead, which is implemented in "VectorUtils.cpp" (the other hooks are implemented in "Vectorization.cpp").
Also in #141244, I am replacing this calculation:
for (unsigned i = 0; i < rank; i++)
inBoundsVal[i] = (destShape[i] == inputVecSizesForLeadingDims[i]) &&
!ShapedType::isDynamic(destShape[i]);
with (==
--> >=
):
for (unsigned i = 0; i < rank; i++)
inBoundsVal[i] = (destShape[i] >= inputVecSizesForLeadingDims[i]) &&
!ShapedType::isDynamic(destShape[i]);
If the write/read indices are 0
(and that's the only situation in which the computation above is valid), then both ==
and >=
are valid (i.e there is not need to require eq
, geq
is also fine).
Addressing this might be tricky - switching from "in-bounds" to "out-of-bounds" will mean that vector.maskedload
will be generated instead of vector.load
in many places.
NEXT STEPS:
- Land [mlir][linalg] Refactor vectorization hooks to improve code reuse #141244 - it maintains the current behaviour and will simplify the next steps.
- Investigate the impact of using "out-of-bounds" instead of "in-bounds".
- Look into value-bounds-analysis as a tool to infer that the indices are indeed
0
, even when using SSA values (that will often be the case in practice).
As a specific example, this was generated from ransform-vector.mlir:
%0 = scf.for %arg3 = %c0 to %c8 step %c8 iter_args(%arg4 = %arg2) -> (tensor<8x32xf32>) {
%1 = scf.for %arg5 = %c0 to %c32 step %c4 iter_args(%arg6 = %arg4) -> (tensor<8x32xf32>) {
%2 = scf.for %arg7 = %c0 to %c16 step %c2 iter_args(%arg8 = %arg6) -> (tensor<8x32xf32>) {
%3 = vector.transfer_read %arg0[%arg3, %arg7], %cst {in_bounds = [true, true]} : tensor<8x16xf32>, vector<8x2xf32>
%4 = vector.transfer_read %arg1[%arg7, %arg5], %cst {in_bounds = [true, true]} : tensor<16x32xf32>, vector<2x4xf32>
%5 = vector.transfer_read %arg8[%arg3, %arg5], %cst {in_bounds = [true, true]} : tensor<8x32xf32>, vector<8x4xf32>
%6 = vector.contract {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %3, %4, %5 : vector<8x2xf32>, vector<2x4xf32> into vector<8x4xf32>
%7 = vector.transfer_write %6, %arg8[%arg3, %arg5] {in_bounds = [false, false]} : vector<8x4xf32>, tensor<8x32xf32>
scf.yield %7 : tensor<8x32xf32>
}
scf.yield %2 : tensor<8x32xf32>
}
scf.yield %1 : tensor<8x32xf32>
}
return %0 : tensor<8x32xf32>
}
It's quite clear that the range of indices for this vector.transfer_write
make it always "in-bounds":
%7 = vector.transfer_write %6, %arg8[%arg3, %arg5] {in_bounds = [false, false]} : vector<8x4xf32>, tensor<8x32xf32>