-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
Description
Consider the following hypothetical example:
import tvm
N = tvm.var("N")
y = tvm.reduce_axis((0,N), 'y')
A = tvm.placeholder((2*N,2*N), "float32", "A")
C = tvm.compute((N, N), lambda i,j: A[2*i,j] + A[i+1,j], name='C')
s = tvm.create_schedule(C.op)
AA = s.cache_read(A, "local", [C])
s[AA].compute_at(s[C], s[C].op.axis[0])
print(tvm.lower(s, [A,C], simple_mode=True))
The resulting lowered code is as follows. See how the size of A.local
depends on i
which is a loop variable. I think the problem is that the two references to A
, i.e., A[2*i,j]
and A[i+1,j]
have different functions of i
(i.e., 2*i
vs i
) in their first dimension. That makes calculating buffer size more complicated, and TVM probably doesn't check for that. We should either handle it properly or emit an error message that such indexing is not supported.
// attr [A.local] storage_scope = "local"
allocate A.local[float32 * ((max((i*2), (i + 1)) - min((i*2), (i + 1))) + 1) * N]
produce C {
for (i, 0, N) {
produce A.local {
for (ax0, 0, ((max((i*2), (i + 1)) - min((i*2), (i + 1))) + 1)) {
for (ax1, 0, N) {
if (likely((min((i*2), (i + 1)) < ((N*2) - ax0)))) {
if (likely((ax1 < (N*2)))) {
A.local[((ax0*N) + ax1)] = A[((((min((i*2), (i + 1)) + ax0)*N)*2) + ax1)]
}
}
}
}
}
for (j, 0, N) {
C[((i*N) + j)] = (A.local[((max((i + -1), 0)*N) + j)] + A.local[(((1 - min(i, 1))*N) + j)])
}
}
}