Description
🚀 Feature
Enable thunder
to autotune and select kernel implementations based on input shapes.
Motivation
Different kernel implementations for an op can perform better on different input shapes. Manually switching kernels is tedious. thunder
could automate this for better performance, especially with dynamic shapes.
For instance, if I have my_op_impl_A
(for small inputs) and my_op_impl_B
(for large inputs), I'd like thunder
to pick the right one automatically at runtime based on the actual input tensor shapes.
Pitch
- Register Implementations: Allow users to provide multiple kernel versions for an operation.
- Autotune:
thunder
profiles these kernels with varying input shapes (user-provided or inferred). - Auto-Select: Based on tuning results,
thunder
dynamically dispatches to the best kernel for the current input shape during execution. - Cache Results: Store tuning outcomes to avoid re-profiling for similar shapes.
This makes thunder
an intelligent dispatcher, boosting performance without manual kernel management.
Alternatives
- Manual
if/else
in Python: Clunky and doesn't integrate well with JIT. - Single "good enough" kernel: Misses optimization opportunities.
An integrated autotuning/selection feature in thunder
would be superior.
Additional context
This capability is present in libraries like TVM and TensorRT. Adding it to thunder
would significantly enhance its utility for PyTorch acceleration, particularly for users with custom ops or those seeking peak performance.