-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Description
Release Plan for v0.2.0
- Release Manager: TBD
- Code Freeze Date: TBD
- Test Verification/Bug Bash:
- Release Date:
- Release Note:
Features(P0)
- explicit warp specialize
- tile schedular
- support transposeB=False for Rocm
- Correctness Evaluation
- Layout Swizzling
Kernels (P0)
-
Implement Flash MLA kernel
- init version
- optimize to SoTA
- MI300
-
Implement NSA kernel
- init version
- decoding
- varlen
- fuse topk
- bwd
- MI300
-
Implement Flash seerAttention
- init version
- different q/kv seq
- varlen
- bwd
-
optimize TileLang Flash Attention kernel to SoTA
- H100
- MI300
-
Complete support for commonly used attributes in Flash Attention
- varlen
- mask/bias
- list all supported dims (benchmark)
- fa3 dim 256 fwd + bwd
- fa3 bwd (64, 128)
Backends #56
- Pass and Migrate CI to H100
- fix fp16xfp4 dequant: testing/python/kernel/test_tilelang_kernel_dequantize_gemm.py: test_simple_impl_float16xfp4_gemm
- fix tma load for float32: testing/python/kernel/test_tilelang_kernel_gemm.py:test_gemm_f32f32f32_nn
- Add support for WebGPU
- Add support for Metal
- Add support for Hexagon
Kernels
- compare with deepGemm
- e2e example: kernel develop flow
- Support FP8/INT8
T.gemm
- Add Examples to CI Test
- optimize TileLang Flash Attention kernel to SoTA on A100
Features
- Nightly Build
- Update API: Replace all tilelang.lower into tilelang.compile in examples and tests.
- Reduce LLVM dependencies
- Provide prebuilt and PyPI packages for ROCm platforms
- Integrate TileLang with Torch Inductor
- Configure API access level to enable advanced features
Cost Model
- Integrate Cost Model Carver into auto-tuning
Zhichenzzz, senlyu163, IndifferentArea, foreverpiano, tth37 and 8 morexysmlx, foreverpiano, yzhangcs, yuninxia, glassmanK and 4 moreeaten-cake
Metadata
Metadata
Assignees
Labels
No labels