-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
(For original problem report see https://discuss.tvm.apache.org/t/most-tasks-failed-with-autoscheduler-on-mali-g610-gpu/16139)
TVM defaults to run at least 1000ms for every task measurement for non-CPU target, but for some fast tasks the task is repeated too many times, for OpenCL target it means too many kernel launch command is enqueued by clEnqueueNDRangeKernel, causing an out of memory (or sometimes the driver just hang, causing timeout) error.
Lowering min_repeat_ms from the default (1000) solves the issue, however, there should be a limit about maximum repeat count for a single kernel.
Expected behavior
The AutoScheduler running without problem
Actual behavior
Every measurement result in this error: InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-6: CL_OUT_OF_HOST_MEMORY
Environment
RK3588 SoC with Mali-G610 MP4 GPU
ARM vendor GPU driver, OpenCL 3.0
Debian 11
TVM master branch
Steps to reproduce
- Try the lightglue model from original forum link, or the script below:
import tvm
from tvm import auto_scheduler, te
# simple add operator, runs very fast
@auto_scheduler.register_workload
def simple_add(N, A, B):
a = te.placeholder((N,), name="a")
b = te.placeholder((N,), name="b")
c = te.compute(a.shape, lambda i: a[i] + b[i], name="c")
return [a, b, c]
if __name__ == '__main__':
target = tvm.target.Target(target='opencl', host='llvm')
task = auto_scheduler.SearchTask(func=simple_add, args=(1024, 1024, 1024), target=target)
tune_options = auto_scheduler.TuningOptions(
num_measure_trials=64,
num_measures_per_round=16,
verbose=5,
)
task.tune(tune_options)
Triage
backend: opencl
tune: auto_scheduler