Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
298 commits
Select commit Hold shift + click to select a range
a961173
[Index] Relocate Int64 Auto Promoter to ConfigBitWidth Pass, removing…
LeiWang1999 Aug 13, 2025
084ab9e
[CI] Bind build-test CI to NVIDIA as AMD runners are being introduced…
LeiWang1999 Aug 14, 2025
6610c7b
fix: NVRTC backend (#717)
lucifer1004 Aug 14, 2025
f5fca05
[CUDA] Init support for sm_120 (#716)
oraluben Aug 14, 2025
6545b08
[CI] fix docs ci (#720)
xwhzz Aug 15, 2025
d074286
[Chore] fix typos (#719)
lucifer1004 Aug 15, 2025
8e1b88f
[CI][AMD] Add AMD GPU CI and fix some related bugs (#694)
Alex4210987 Aug 15, 2025
2bd2d69
[Carver][Bugfix] Correct score function for warp tile selection in te…
NaOHCC Aug 15, 2025
c369d69
[Refactor] Refactor CUDA code generation to simplify eviction policy …
LeiWang1999 Aug 16, 2025
1b308ba
[Language] Introduce `StridedTensor` to support non contigious torch …
LeiWang1999 Aug 17, 2025
f4a828f
[Enhancement][Bugfix] Fix bug in warp specialized pass and add gemm_s…
xwhzz Aug 18, 2025
a5074fd
📝 Add docstrings to `fix` (#726)
coderabbitai[bot] Aug 18, 2025
a86223f
[CI] Fix AMD CI (#729)
Alex4210987 Aug 18, 2025
24603e4
[Feature] Low-bit twiddling dequantization and FP4 GEMM (#725)
tzj-fxz Aug 19, 2025
e3a80b7
📝 Add docstrings to `mxfp4` (#732)
coderabbitai[bot] Aug 19, 2025
72be490
[Refactor] Refactor env into a more flexible version (#740)
LeiWang1999 Aug 19, 2025
fff24ae
[Enhancement] Add stride index validation in CythonKernelWrapper (#743)
LeiWang1999 Aug 20, 2025
ce7b932
[Bugfix]:Fix atomic add auto vectorize memory access out of bound err…
yyttt6 Aug 20, 2025
eccdfe1
📝 Add docstrings to PR #744 (#745)
coderabbitai[bot] Aug 21, 2025
cb37bfe
[Refactor] Refactor barrier management (#744)
LeiWang1999 Aug 21, 2025
5c11d24
[Refactor] Merge bulk copy into copy and improve layout inference for…
LeiWang1999 Aug 22, 2025
6b12502
[Refactor] Merge ThreadPartialSync and ThreadStorageSync (#741)
LeiWang1999 Aug 23, 2025
e835762
[Enhancement] Optimize loop body handling in IR (#749)
chengyupku Aug 23, 2025
796b3bb
[MXFP4] Fix bugs and optimize exponential operation (#750)
tzj-fxz Aug 23, 2025
e68fdab
[Enhancement] Add DispatchInstruction specialization for fp8 types in…
LeiWang1999 Aug 24, 2025
c2fe91e
[Enhancement] Add shape checking for reduce options (#748)
kurisu6912 Aug 24, 2025
cf7be05
[Bugfix] Add missing FP8 header include (#752)
LeiWang1999 Aug 24, 2025
fd199a4
[MXFP4] Add bias to MXFP4 GEMM kernel (#753)
tzj-fxz Aug 24, 2025
b39aaf5
[Bugfix][WS] Consider loop min extent when computing phase id (#754)
LeiWang1999 Aug 24, 2025
556d411
[Typo] Remove `disable_cache` in some tests (#755)
LeiWang1999 Aug 24, 2025
e0cf5fe
[README] Update GDN README for clarity and add acknowledgements (#758)
chengyupku Aug 25, 2025
e05a20a
cutlass v4.2.0 supporting cuda 13 (#760)
johnnynunez Aug 26, 2025
1774a1a
[Feature] Add 1D TMA support (#761)
tzj-fxz Aug 28, 2025
3705141
[Example] Add vertical slash sparse attention pattern (#762)
xwhzz Aug 28, 2025
ff35fc0
[Bugfix] Address PassContext contamination from CI and fix incorrect …
xwhzz Aug 28, 2025
ea54830
[MXFP4] Add 1D TMA copy for Scale tensor in MXFP4 GEMM (#766)
tzj-fxz Aug 28, 2025
277ed53
hot fix blackwell (#768)
johnnynunez Aug 29, 2025
b38bd69
[Refactor] Refactor `Operator` into `TileOperator` and with tvm refle…
LeiWang1999 Aug 29, 2025
8eab775
[Reducer] Introduce `alloc_reducer` to separate inter and intra warp …
LeiWang1999 Aug 31, 2025
2af3f22
📝 Add docstrings to `pytile_0826` (#770)
coderabbitai[bot] Aug 31, 2025
a7a29c0
[Bugfix]:Fix atomic add auto vectorize negative optimization (#765)
yyttt6 Aug 31, 2025
9a86939
📝 Add docstrings to `reducer_0825` (#772)
coderabbitai[bot] Aug 31, 2025
03f2198
Allow fill global buffer (#774)
kurisu6912 Sep 1, 2025
68af215
[BugFix] Refactor the op check in LowerTileOp pass using the member f…
tzj-fxz Sep 1, 2025
471cc7f
add bf16 exp fallback (#776)
xwhzz Sep 1, 2025
cdc5d8d
[Lint] Introduce clang-tidy into format.sh (#777)
LeiWang1999 Sep 2, 2025
7ffc5b4
[Cache] Introduce detailed target information for the disk kernel cac…
LeiWang1999 Sep 2, 2025
021e44e
[Example]Adds example for top-k operation (#775)
Cunxiao2002 Sep 2, 2025
b66f9aa
[Math] Dispatch `T.rsqrt(x)` into cuda intrin instead of `1 / T.sqrt(…
LeiWang1999 Sep 2, 2025
141e01f
[CI] Adds pytest-durations for test timing (#782)
Cunxiao2002 Sep 3, 2025
3cfefc8
[Refactor] Support python reflection for tile operators (#783)
LeiWang1999 Sep 4, 2025
f07f31c
[AMD] Fix amd tir&add examples (#784)
Alex4210987 Sep 4, 2025
6e0c350
[Nvidia][SM121] Add intrin.h include to gemm_mma.h for sm120+(#785)
HaoKang-Timmy Sep 4, 2025
e5b61e9
[Feat] Add tilelang T.assume support and assume injection for buffer …
kurisu6912 Sep 5, 2025
013adca
[Bugfix] Fix incorrect synchronization bug in minference example (#786)
xwhzz Sep 5, 2025
cda5ea1
[AMD] fix bugs in warp shuffle (#790)
txs19991 Sep 5, 2025
b6b02da
[AMD] fix mfma op interface (#791)
Paran0idy Sep 6, 2025
9d7d45b
[TMA] Automatically lower 1d tma in appropriate cases (#788)
LeiWang1999 Sep 6, 2025
bcfc834
[CI]Adds pytest timeout to CI (#792)
Cunxiao2002 Sep 6, 2025
7467f2b
Resolve reference cycle. (#795)
LeiWang1999 Sep 9, 2025
54aaec9
Refactor index handling in BufferStore and BufferLoad to promote 64-b…
LeiWang1999 Sep 9, 2025
9fd6bb3
[AMD] support mfma i32_16x16x32_i8 (#800)
Paran0idy Sep 10, 2025
91a7bb2
[TileOp] Introduce a experimental python defined `T.gemm_v2` (#793)
LeiWang1999 Sep 10, 2025
5529363
[Bugfix] Expose alloc_reducer definition to the python side (#802)
LeiWang1999 Sep 11, 2025
b62a0b4
[Refactor] Use new namespace and enhance dispatch macros for mma (#801)
LeiWang1999 Sep 11, 2025
409ab83
[AMD] support fp8 T.gemm (#804)
txs19991 Sep 11, 2025
143b522
[AMD] support preshuffle weight mfma (#806)
Paran0idy Sep 12, 2025
4d54854
Add pytest-durations to requirements for ROCm (#810)
Alex4210987 Sep 12, 2025
5e52952
[Lint] Add ruff config to check for useless spaces (#807)
oraluben Sep 13, 2025
ae9b706
[Feature] Add ptx_cp_async_barrier_noinc intrinsic and related functi…
chengyupku Sep 14, 2025
f0d6669
[Fix] Fix lower bug when buffer store is not guarded by any tile op (…
kurisu6912 Sep 14, 2025
0b3683b
[feat] support gemm_sp for ampere and ada arch (#691)
botbw Sep 15, 2025
8b00522
[Refactor] Update TVM subproject and refactor BlockNode handling in w…
chengyupku Sep 15, 2025
5c869bc
[Refactor] Reopen #794 Fix lower bug when buffer store is not guarded…
kurisu6912 Sep 15, 2025
85d1a6b
[Refactor] Update TVM subproject and streamline buffer store handling…
chengyupku Sep 15, 2025
4bcb159
[Example] add w4a8 gemm kernel (#815)
Cunxiao2002 Sep 16, 2025
d3e75b7
[CI] fix rocm ci (#819)
Cunxiao2002 Sep 16, 2025
907c3ff
[Example] Remove redundant param (#821)
botbw Sep 16, 2025
1547995
[DSL] Support python tenary if then else expression (#822)
LeiWang1999 Sep 17, 2025
a57f827
[Bugfix] Bug fix when git command is not installed (#823)
LeiWang1999 Sep 17, 2025
e4a346f
[Bugfix] Skip fp4 dtype binding when using older versions of ml_dtype…
LeiWang1999 Sep 17, 2025
8554cb0
[Enhancement] Add a MXFP4 grouped GEMM example for FusedMoE (#811)
Rachmanino Sep 17, 2025
2f7dc52
[CMake] Added support for statically linked system libc library (#825)
LeiWang1999 Sep 17, 2025
232782d
[Refactor] Refactor some build related configurations (#827)
LeiWang1999 Sep 18, 2025
ebea77d
[CI] Test Fix: Handle BufferLoad nodes when T.gemm input has a stride…
LeiWang1999 Sep 18, 2025
e7e3835
[Refactor] Turn off `ENABLE_FAST_MATH` by default (#846)
LeiWang1999 Sep 18, 2025
6efeb74
[AMD] fix bf16x2 dtype codegen (#847)
Paran0idy Sep 18, 2025
c36a7ee
[Typing] Fallback from Python 3.10+ type syntax for compatibility (#848)
LeiWang1999 Sep 18, 2025
8cc2ab2
[TIR] Refactor division simplification in RewriteSimplifier (#849)
LeiWang1999 Sep 18, 2025
bc9623f
[Py38] Revert typing and parser updates for Python 3.8 compatibility …
LeiWang1999 Sep 19, 2025
094e229
[Refactor] Enhance buffer store transformation in TIR pass (#851)
LeiWang1999 Sep 19, 2025
1ad6e46
[Release] Bump Version to 0.1.6 (#818)
LeiWang1999 Sep 19, 2025
a3497eb
[PATCH] Static libg++ linking fix (#854)
LeiWang1999 Sep 21, 2025
bd16865
[Analyzer] Enhance ConstIntBoundAnalyzer and IntervalSet with modular…
LeiWang1999 Sep 22, 2025
058a670
[Doc] Optimize the quickstart guide for clarity and not just for CUDA…
LeiWang1999 Sep 22, 2025
b9a51c4
[TMA] Bugfix when a shared buffer is both issued with tma store and t…
LeiWang1999 Sep 22, 2025
3b21a67
[AMD][MLA] Fix mla autotune for rocm (#861)
LeiWang1999 Sep 22, 2025
b12a63c
[Bugfix] Ensure correct handling for cases where `seq_q<seq_kv` in f…
Rachmanino Sep 23, 2025
48c9a35
[AMD] refactor MatrixCoreIntrinEmitter (#860)
Paran0idy Sep 23, 2025
86aaf3c
Add fast sine and cosine definitions in common.h for CUDA templates (…
Rachmanino Sep 23, 2025
9cbbbbc
[Layout] Support layout forward with multi dimension (#867)
LeiWang1999 Sep 23, 2025
b448309
[Autotune][Conv] optimize convolution examples to use autotune (#866)
LeiWang1999 Sep 23, 2025
d9a171c
[Example] Add examples to support efficient attention sink forward pr…
Rachmanino Sep 23, 2025
fa4fd0b
[Parser] Adapt Parser to work with Python 3.8 in some cases (#869)
LeiWang1999 Sep 24, 2025
2d4b848
[Fix] tilelang can now vectorize `B[i,j] = c[i] + A[i,j]` (#798)
kurisu6912 Sep 24, 2025
c538d8a
[Language] Support sequence comparisons (#872)
LeiWang1999 Sep 25, 2025
15a303d
[Language] Support loop_break primitive (#873)
chengyupku Sep 25, 2025
1dfac2e
[Bugfix] Use `ExprDeepEqual` instead of `StructuralEqual` when merge …
LeiWang1999 Sep 25, 2025
aa0b109
[Language] Support atomic add with ret (#870)
LeiWang1999 Sep 25, 2025
6f6ef7a
[Cython] Remove an incorrect check (#880)
LJC00118 Sep 26, 2025
56f7494
[CI][AMD] Remove amd Timeout test (#881)
Alex4210987 Sep 26, 2025
95c373f
[FastMath] Disable default TVM fastmath intrinsic dispatch and add ex…
LeiWang1999 Sep 26, 2025
ec24561
[Example] Add efficient attention sink backward implementations and t…
Rachmanino Sep 26, 2025
a58bf9b
[Precision] Introduce `T.ieee_rsqrt` and related high precision op (#…
LeiWang1999 Sep 26, 2025
c861d8a
[Dist] Provide an option to include commit ID in version (#884)
LeiWang1999 Sep 26, 2025
bf67fb1
[Example] Optimize sink attention forward via swizzled layout and rep…
Rachmanino Sep 26, 2025
c382dcb
[Layout] Introduce Flexible Parallel to Support T.serial and local bu…
LeiWang1999 Sep 26, 2025
f58bcd4
[SM100] Add sm100 GEMM layouts and tcgen05 support (#887)
Hamerlate Sep 28, 2025
599264c
[Bugfix] Fix CopyNode Lower method to include disable_tma flag in Get…
Rachmanino Sep 28, 2025
6c67a77
[Layout] fix plot layout (#890)
Paran0idy Sep 29, 2025
4424fa9
[Example] Add example (#894)
LeiWang1999 Sep 29, 2025
78664e2
[News] Add announcement of support for Huawei Ascend chips (#895)
xwhzz Sep 29, 2025
65ac745
[Example] Add sparse mla examples (#896)
LeiWang1999 Sep 29, 2025
d19fe1a
[Typo] Fix backend name for Huawei Ascend (#898)
xwhzz Sep 29, 2025
54fc6ba
[CI] Legalize math related test (#899)
LeiWang1999 Sep 29, 2025
1656115
[Bugfix] Fix flops comp and softmax scale in mla (#900)
Edenzzzz Sep 29, 2025
6021ef3
[Example] Add topk into sparse mla example and append some docs (#901)
LeiWang1999 Sep 29, 2025
f92de93
[Typo] Fix branch name & link for AscendNPU IR in latest news (#907)
xwhzz Sep 30, 2025
3ad6202
[Example] Specify a fixed commit for the flash-linear-attention repos…
LeiWang1999 Sep 30, 2025
a35ac49
[CI] optimize CI time for sparse gemm (#906)
botbw Sep 30, 2025
f737fa9
[Enhancement] Include compile flags into the hash key of cached kerne…
Rachmanino Oct 1, 2025
1b4cd38
[Bugfix] Fix saving kernel source code where JITKernel.artifact is No…
zjudmd1015 Oct 1, 2025
9d38297
[CI] Refactor import paths in dequantization examples to use dequanti…
LeiWang1999 Oct 1, 2025
8150e47
[Example] Add MLA decode ws example (#928)
chengyupku Oct 1, 2025
f09e91e
[CI] Fix documentation runner by adding 'nvidia' tag
xwhzz Oct 1, 2025
fc4bd45
[Layout] Strict annotate completed replicated layout for fragment wit…
LeiWang1999 Oct 2, 2025
5ccac4f
[Bugfix] Fix tensor memory copy layout (#933)
Hamerlate Oct 2, 2025
242cb45
[Example] Optimize online_softmax example (#934)
lijinpei Oct 4, 2025
d5c88af
[Example] Add correctness assert into dsa example (#937)
LeiWang1999 Oct 4, 2025
b31de0c
[Enhancement] Enhance and add new GQA backward examples for Hopper (#…
Rachmanino Oct 4, 2025
95170ab
[Enhancement] Fix lint to improve grouped GEMM performance with TMA (…
Cunxiao2002 Oct 5, 2025
557589f
[Example] Introduce split+sum template, and optimize `atomic_add` per…
LeiWang1999 Oct 5, 2025
3aecab8
[Example] Disable TMA and enable FastMath for NSA Examples (#941)
LeiWang1999 Oct 5, 2025
481cae4
[Example] Revert the atomic/split&sum templates in MHA backward examp…
Rachmanino Oct 6, 2025
ac8c9af
[Example] Add sparse mla bwd example for deepseek_v32 (#919)
Zhichenzzz Oct 6, 2025
91d5ef5
[Profiler] Adds CUPTI profiler support (#936)
Cunxiao2002 Oct 6, 2025
c61971e
[Enhancement] Add buffer load copy functions and improve copy logic i…
LeiWang1999 Oct 7, 2025
394e17d
[Refactor] Refine nvrtc compile related check style (#945)
BBuf Oct 7, 2025
7fb0677
[Backend] Add metal backend (#799)
oraluben Oct 7, 2025
f6d4bd3
[CI] enable dependabot for GHA workflows (#950)
XuehaiPan Oct 9, 2025
07f6210
Modify the SM architecture number to support Thor’s sm110. (#957)
iloveai8086 Oct 9, 2025
9a7cda4
[CI] auto-cancel in-progress PR CI when new commits are pushed (#956)
XuehaiPan Oct 9, 2025
6b2bb31
[Bugfix] Fix type object is not subscriptable in py38 (#959)
BBuf Oct 9, 2025
2dea17e
[Bugfix][Doc] Add astroid version constraint to requirements.txt (#958)
xwhzz Oct 9, 2025
d8fedc1
[CI]: Bump actions/setup-python from 2 to 6 (#951)
dependabot[bot] Oct 9, 2025
b6f90d2
[CI]: Bump astral-sh/setup-uv from 6 to 7 (#952)
dependabot[bot] Oct 9, 2025
5d881a5
[CI]: Bump actions/github-script from 7 to 8 (#954)
dependabot[bot] Oct 9, 2025
10adb79
[CI]: Bump actions/checkout from 2 to 5 (#953)
dependabot[bot] Oct 9, 2025
a13cde2
[TileOp] Implement WGMMA for T.gemm_v2 (#813)
LeiWang1999 Oct 9, 2025
8f07b9b
[Docs] add CODE_OF_CONDUCT.md (#965)
XuehaiPan Oct 10, 2025
7cd0da9
[Example] Add support for `bfloat16` and user-defined `sm_scale` in a…
Rachmanino Oct 10, 2025
f8ae600
[Bugfix] Do not force inline let stmt (#947)
LeiWang1999 Oct 10, 2025
8fe3540
[CI] add `pre-commit` integration (#955)
XuehaiPan Oct 10, 2025
6031416
[Doc] Install docs add docker install method (#961)
BBuf Oct 10, 2025
7913fb1
[Bugfix] Fix dummy kernel compliation (#962)
SiriusNEO Oct 10, 2025
0ae183d
[CI][Refactor] Refactor non-test CI workflow files (#971)
XuehaiPan Oct 11, 2025
747381a
[TileOp] Implememt `CumSum1D` (#978)
LeiWang1999 Oct 11, 2025
77e31e5
[Language] Enhance `T.alloc_var` for AugAssign and AnnAsign (#979)
LeiWang1999 Oct 11, 2025
ddfaac3
[Refactor] Refactor Pass `InjectFenceProxy` and expose some warp grou…
LeiWang1999 Oct 11, 2025
117f2b8
[Typo] Remove debug print (#980)
LeiWang1999 Oct 11, 2025
77b9d08
[Bugfix] Use `access_ptr("r")` instead of `access_ptr("w")` for corre…
LeiWang1999 Oct 11, 2025
0550703
[Feature][Example] Support TMA reduce operation and update GQA bwd ex…
chengyupku Oct 11, 2025
b0b5347
[Bugfix] Add NVIDIA HPC SDK support in CUDA detection (#974) (#976)
Degeneracy-Evil Oct 12, 2025
fc41463
[BugFix] Robust gemm policy for sparse_mla_fwd in Hopper and Ada Love…
tzj-fxz Oct 12, 2025
4a229dd
[Bugfix] Fallback `torch.accelerator.synchronize()` to `torch.cuda.sy…
yyttt6 Oct 12, 2025
340bfc5
[Bugfix] Fix atomicadd auto vectorize identify var error (#883)
yyttt6 Oct 13, 2025
bab57f2
[CI] Speed up sparse tensor core test via vectorized generating spars…
LeiWang1999 Oct 13, 2025
d89ba5b
[Build] Migrate to scikit-build-core (#939)
oraluben Oct 13, 2025
eb37e45
[CI] Removes redundant environment variable (#1020)
Cunxiao2002 Oct 13, 2025
7a5077e
[Transform] Migrate `LowerIntrin` from tvm into tilelang (#999)
LeiWang1999 Oct 14, 2025
d684094
[Lint] Prefer American English spelling (#1022)
XuehaiPan Oct 14, 2025
0f515b8
[Build] Prefer libs from local build dir (#1027)
oraluben Oct 14, 2025
e59e7f9
[Language] Support Consequential assignments like 'a = b = c = 1' (#992)
LeiWang1999 Oct 14, 2025
2ada4ec
[CI] Removes debug print statements from the example. (#1030)
Cunxiao2002 Oct 14, 2025
1e8f0b1
[Enhancement] Update abs function for half_t and bfloat_t to use cutl…
Rachmanino Oct 14, 2025
eed320f
[Bugfix] Recover code for flexible parallel (#1032)
LeiWang1999 Oct 14, 2025
5767475
[CI] Disable buggy(maybe) warp specialized kernel ci test for H20 (#1…
LeiWang1999 Oct 14, 2025
e539952
[TIR] Revert some changes of Pass `LowerIntrin` (#1035)
LeiWang1999 Oct 15, 2025
c67f73b
[Env] Optimize the mechanism for locating `TL_LIBS` (#1038)
LeiWang1999 Oct 15, 2025
32ddc1a
[CUDA] Add pack functions for FP8 types (#967)
LJC00118 Oct 15, 2025
b78d840
[Language] Expose `T.get_warp_idx_sync` and `T.shuffle_elect` for eff…
LeiWang1999 Oct 15, 2025
80665cd
fix bug&add amd examples (#966)
Alex4210987 Oct 15, 2025
8ce2778
[CI][Refactor] Merge test CI workflow files into one (#973)
XuehaiPan Oct 15, 2025
8f001e0
[BugFix] Phaseout dependency of Triton in sink examples to make CI ha…
Rachmanino Oct 15, 2025
bd1c7b3
[Refactor] Use `has_simt_copy` to decide whether to insert `set_max_n…
chengyupku Oct 15, 2025
0ff4f42
[Feature]: Add test for atomicadd auto vectorize and remove useless c…
yyttt6 Oct 16, 2025
e3742d3
Allow mma gemm for all cuda (#1047)
oraluben Oct 16, 2025
1f4ffdb
[Bugfix] Improves compatibility when checking for MPS availability in…
LeiWang1999 Oct 16, 2025
a79bc5c
[CI] Fix ROCm CI (#1043)
XuehaiPan Oct 16, 2025
cc00fb6
[Enhancement] Add support for symbolic dimensions in Cython kernel ad…
Rachmanino Oct 17, 2025
fd1493b
Automatically initialize submodule if missing (#1052)
LeiWang1999 Oct 17, 2025
35cf888
[Enhancement] Remove constraint requiring last dimension stride to be…
LJC00118 Oct 17, 2025
1281d6f
[CI] Disable autofix for pre-commit CI (#1053)
LeiWang1999 Oct 17, 2025
37b3dbd
[Enhancement] Improve CUDA compiler detection in CMake (#1054)
LJC00118 Oct 17, 2025
278c0fb
[Enhancement] Introduce a workaround for layout inference for local b…
LeiWang1999 Oct 17, 2025
7211164
[Refactor] Refactor Pass `LegalizeSafeMemoryAccess` to support recurs…
SiriusNEO Oct 17, 2025
bf2de5b
Making version parser more robust against missing or unavailable meta…
LeiWang1999 Oct 18, 2025
759c2e3
[DOC] Add document for develop with PYTHONPATH (#1062)
LeiWang1999 Oct 18, 2025
4ca6c13
[CI]:Reduce test shapes to avoid OOM errors during CI. (#1060)
yyttt6 Oct 18, 2025
fb8b3af
[Benchmark] Add H800 SXM Benchmark results (#1063)
LeiWang1999 Oct 19, 2025
b7dfdb3
[Misc] Add GitHub issue templates (#1057)
XuehaiPan Oct 19, 2025
ae9a6f0
[Refactor][Example] Update linear attention examples and add tests (#…
Rachmanino Oct 19, 2025
17bd0a6
[Enhancement] Deprecate split&sum in attn bwd examples on Hopper and …
Rachmanino Oct 19, 2025
b2acfc3
[Benchmark] Add matmul FP16 benchmark results (#1067)
LeiWang1999 Oct 19, 2025
e57ef58
[CI]: Bump actions/checkout from 4 to 5 (#1070)
dependabot[bot] Oct 20, 2025
d66b83c
[Example] Update GQA varlen fwd and MHA varlen fwd (#1071)
chengyupku Oct 20, 2025
27701c3
[Parallel] Support `T.Parallel` with dynamic extents (#990)
LeiWang1999 Oct 20, 2025
6a388c0
[Layout] Utilizing IsEqual instead of StructuralEqual (#1073)
LeiWang1999 Oct 20, 2025
1516f43
[Cache] raise errors for `tileang.clear_cache()` (#1077)
LeiWang1999 Oct 20, 2025
ba410ae
[Feature] Support Reduce operators for bitwise and/or/xor (#1074)
tzj-fxz Oct 20, 2025
fd6cec5
[Autotune] Add autotune coverage for symbolic M and normalize cache k…
LeiWang1999 Oct 20, 2025
a773027
[Language] Recommend using `T.dynamic` instead of `T.symbolic` (#1076)
LeiWang1999 Oct 20, 2025
bc37ea6
[Language] Efficient `T.reduce_` with shared memory input/output (#1080)
LeiWang1999 Oct 20, 2025
f8d3e73
[Bugfix] Fix missing reg alloc in custom warp specialization (#1084)
chengyupku Oct 20, 2025
bb8b3cd
[Enhancement] Update async intrinsic handling in inject_fence_proxy (…
Rachmanino Oct 20, 2025
792e5d5
[Feature] Add GQA backward kernel with varlen input (#1082)
tzj-fxz Oct 21, 2025
1d4b718
[BugFix] Add memory order argument for non-vectorized atomic add (#1081)
tzj-fxz Oct 21, 2025
60e9c7e
[Refactor] Rename cython output to `tilelang_cython` and relocate its…
LeiWang1999 Oct 21, 2025
42c267e
[Target] Enhance target selection helpers and documentation (#1085)
LeiWang1999 Oct 21, 2025
0c7e741
[Cleanup] Remove `tilelang.disable_cache()` calls from examples and t…
Rachmanino Oct 21, 2025
cdc67fc
[PassConfig] Introduce PassConfig `TL_STORAGE_REWRITE_DETECT_INPLACE`…
LeiWang1999 Oct 21, 2025
bddb125
[Language] Support tilelang `alloc_var(dtype, init=x)` (#1092)
LeiWang1999 Oct 21, 2025
5cb5c06
[Bugfix] Fix missing host cuTensorMapEncodeIm2col call (#1094)
chengyupku Oct 21, 2025
f003f37
[GQA] Add regional atomic add to slightly boost performance (#1093)
tzj-fxz Oct 21, 2025
514bdea
[Example] Add block level high performance gemv example (#1097)
LeiWang1999 Oct 22, 2025
151d9e6
[Refactor] Optimize debug message for parallel inference (#1096)
LeiWang1999 Oct 22, 2025
5683e6a
[CI][Lint] Retire `format.sh` and add `clang-tidy` to GHA workflow (#…
XuehaiPan Oct 22, 2025
8a5eb56
[Refactor] Use forceinline in `ldmatrix` and update mamba scan kernel…
chengyupku Oct 22, 2025
e28433e
[Maint] Update uncommitted change detection command in `format.sh` (#…
XuehaiPan Oct 22, 2025
717f7b5
[Benchmark] Add Mamba2_chunk_scan benchmark (#1109)
chengyupku Oct 22, 2025
4f3523d
[Benchmark] Update Mamba2_chunk_scan benchmark (#1110)
chengyupku Oct 22, 2025
f14fb11
[Lint] Enable pyupgrade linter in ruff (#963)
oraluben Oct 23, 2025
86c8bb4
[Refactor] Improve scalar handling in CopyNode and update loop partit…
LeiWang1999 Oct 23, 2025
a148d62
[Feature] Enhance vectorized conversion support in CUDA codegen (#1095)
Rachmanino Oct 23, 2025
2a382c8
Merge remote-tracking branch 'upstream/main' into merge1023
chengyupku Oct 23, 2025
c68a512
fix
chengyupku Oct 23, 2025
39dbb91
lint
chengyupku Oct 23, 2025
288c025
[Install] Use pyproject to install extensions
chengyupku Oct 24, 2025
b7a3898
[Install] Merge and into one extension
chengyupku Oct 24, 2025
6b201c6
lint
chengyupku Oct 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 8 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
BasedOnStyle: LLVM
UseTab: Never
IndentWidth: 2
ColumnLimit: 80

Language: Cpp
Standard: c++17
62 changes: 55 additions & 7 deletions .clang-tidy
Original file line number Diff line number Diff line change
@@ -1,10 +1,58 @@
Checks: >
---
InheritParentConfig: true
ExtraArgs: ['-v']
FormatStyle: file
UseColor: true
WarningsAsErrors: '*'
ExcludeHeaderFilterRegex: '^(3rdparty|tvm)/.*$'

# NOTE: there must be no spaces before the '-', so put the comma last.
Checks: >-
# 1. Retained categories: easier to find bugs/performance issues
clang-analyzer-*,
cppcoreguidelines-*,
modernize-*,
cppcoreguidelines-pro-type-static-cast-downcast,
cppcoreguidelines-pro-type-member-init,
cppcoreguidelines-pro-bounds-array-to-pointer-decay,
cppcoreguidelines-pro-bounds-pointer-arithmetic,
cppcoreguidelines-slicing,
cppcoreguidelines-narrowing-conversions,
performance-*,
readability-*,
-readability-identifier-length
WarningsAsErrors: '*'

HeaderFilterRegex: '^(?!.*(3rdparty|build)).*$'
# 2. Readability: only keep useful rules
readability-braces-around-statements,
readability-container-size-empty,
readability-delete-null-pointer,
readability-redundant-member-init,
readability-redundant-smartptr-get,
readability-redundant-string-cstr,

# 3. Disable all intrusive/style-breaking rules
-readability-identifier-length,
-readability-avoid-const-params-in-decls,
-readability-else-after-return,
-cppcoreguidelines-avoid-magic-numbers,
-modernize-use-trailing-return-type,
-modernize-use-nodiscard,
-modernize-use-auto,
-modernize-pass-by-value,
-modernize-return-braced-init-list,
-modernize-use-default-member-init,
-modernize-loop-convert,
-modernize-concat-nested-namespaces,
-llvm-include-order,
-bugprone-unused-return-value,
-clang-diagnostic-unused-result,
-cppcoreguidelines-special-member-functions,
-performance-noexcept-move-constructor,
-cppcoreguidelines-narrowing-conversions,
-clang-diagnostic-error,
-cppcoreguidelines-pro-type-member-init,
-clang-analyzer-optin.cplusplus.UninitializedObject,
-cppcoreguidelines-pro-type-static-cast-downcast,
-performance-unnecessary-value-param,
-performance-enum-size,
-cppcoreguidelines-pro-bounds-pointer-arithmetic,
-cppcoreguidelines-pro-bounds-array-to-pointer-decay,
-clang-analyzer-deadcode.DeadStores,
-clang-analyzer-optin.cplusplus.VirtualCall,
-clang-diagnostic-tautological-constant-compare,
44 changes: 44 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# https://editorconfig.org/

root = true

[*]
charset = utf-8
end_of_line = lf
indent_style = space
indent_size = 4
trim_trailing_whitespace = true
insert_final_newline = true

[*.{py,pyi}]
indent_size = 4

[*.{cpp,hpp,cxx,cc,c,h,cu,cuh}]
indent_size = 2

[{*.cmake,CMakeLists.txt}]
indent_size = 2

[*.{yaml,yml}]
indent_size = 2

[.clang-{format,tidy}]
indent_size = 2

[Makefile]
indent_style = tab

[*.sh]
indent_size = 4

[*.bat]
indent_size = 4
end_of_line = crlf

[*.md]
indent_size = 2
x-soft-wrap-text = true

[*.rst]
indent_size = 4
x-soft-wrap-text = true
9 changes: 9 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
* text eol=lf
*.bat eol=crlf

*.svg binary
*.jpg binary
*.jpeg binary
*.png binary
*.gif binary

*.h linguist-language=C++
112 changes: 112 additions & 0 deletions .github/ISSUE_TEMPLATE/bug-report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
name: 🐛 Bug Report
description: File an issue about a bug.
title: "[BUG] "
labels: [bug]
assignees: []
body:
- type: markdown
attributes:
value: >-
Please do your best to make the issue as easy to act on as possible,
and only submit here if there is clearly a problem with TileLang.

- type: checkboxes
id: steps
attributes:
label: Required prerequisites
description: Make sure you've completed the following steps before submitting your issue -- thank you!
options:
- label: I have read the documentation <https://tilelang.com>.
required: true
- label: >-
I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues)
that this hasn't already been reported. (comment there if it has.)
required: true

- type: input
id: version
attributes:
label: What version of TileLang are you using?
description: >-
Run command `python3 -c 'print(__import__("tilelang").__version__)'` in your shell
and paste the output here.
placeholder: E.g., 0.1.5
validations:
required: true

- type: textarea
id: system-info
attributes:
label: System information
description: |
Describe the characteristic of your environment:

- Describe how the library was installed (pip, conda, source, ...)
- Python version
- Versions of any other relevant libraries

```python
import sys, tilelang, torch
print(sys.version, sys.platform)
print(tilelang.__version__)
print(torch.__version__)
```

```bash
python3 -m torch.utils.collect_env
```
validations:
required: true

- type: textarea
id: description
attributes:
label: Problem description
description: >-
Provide a short description, state the expected behavior and what actually happens. Include
relevant information like what version of TileLang you are using, what system you are on, and
any useful commands / output.
validations:
required: true

- type: textarea
id: code
attributes:
label: Reproducible example code
description: >-
The code should be minimal, have minimal external dependencies, and isolate the functions
that cause breakage. Submit matched and complete snippets that can be easily run to diagnose
the issue.
value: |
The Python snippets:

```python

```
validations:
required: true

- type: textarea
id: traceback
attributes:
label: Traceback
description: Put the Python traceback information here.
placeholder: |
Traceback (most recent call last):
File ...
render: pytb

- type: textarea
id: expected
attributes:
label: Expected behavior
description: Provide a clear and concise description of what you expected to happen.

- type: textarea
id: additional-context
attributes:
label: Additional context
description: >-
Add any other context about the problem here. Screenshots may also be helpful.

If you know or suspect the reason for this bug, paste the code lines and suggest modifications.
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
blank_issues_enabled: false
45 changes: 45 additions & 0 deletions .github/ISSUE_TEMPLATE/feature-request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: ✨ Feature Request
description: Suggest an idea for this project.
title: "[Feature Request] "
labels: [enhancement]
body:
- type: checkboxes
id: steps
attributes:
label: Required prerequisites
description: Make sure you've completed the following steps before submitting your issue -- thank you!
options:
- label: >-
I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues)
that this hasn't already been reported. (comment there if it has.)
required: true

- type: textarea
id: motivation
attributes:
label: Motivation
description: Outline the motivation for the proposal.
value: |
<!-- Please outline the motivation for the proposal.
Is your feature request related to a problem? E.g., "I'm always frustrated when [...]".
If this is related to another issue, please link here too. -->
validations:
required: true

- type: textarea
id: solution
attributes:
label: Solution
description: Provide a clear and concise description of what you want to happen.

- type: textarea
id: alternatives
attributes:
label: Alternatives
description: A clear and concise description of any alternative solutions or features you've considered.

- type: textarea
id: additional-context
attributes:
label: Additional context
description: Add any other context about the problem here. Screenshots may also be helpful.
25 changes: 25 additions & 0 deletions .github/ISSUE_TEMPLATE/questions.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: 🤔 Questions / Help / Support
description: Do you need support?
title: "[Question] "
labels: [question]
body:
- type: checkboxes
id: steps
attributes:
label: Required prerequisites
description: Make sure you've completed the following steps before submitting your issue -- thank you!
options:
- label: I have read the documentation <https://tilelang.com>.
required: true
- label: >-
I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues)
that this hasn't already been reported. (comment there if it has.)
required: true

- type: textarea
id: questions
attributes:
label: Questions
description: Describe your questions with relevant resources such as snippets, links, images, etc.
validations:
required: true
11 changes: 11 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
day: "monday"
time: "12:00"
timezone: "Asia/Shanghai"
commit-message:
prefix: "[CI]"
64 changes: 0 additions & 64 deletions .github/workflows/bot.yml

This file was deleted.

Loading
Loading