-
Notifications
You must be signed in to change notification settings - Fork 133
Comparing changes
Open a pull request
base repository: tile-ai/tilelang
base: v0.1.2
head repository: tile-ai/tilelang
compare: v0.1.2.post1
- 8 commits
- 37 files changed
- 5 contributors
Commits on Mar 6, 2025
-
Add libstdcxx-ng-12 to Dockerfiles for CUDA versions (#160)
Update Dockerfiles for CUDA 118, 120, 121, 123, 124, 125, and 126 to install libstdcxx-ng-12 from conda-forge, ensuring consistent standard library support across different CUDA versions
Configuration menu - View commit details
-
Copy full SHA for 5935e37 - Browse repository at this point
Copy the full SHA 5935e37View commit details -
Add cpu jit with backend ctypes (#154)
* Add cpu jit with backend ctypes * Resolve some lint issues * Apply PR feedback on head file and kernel example * Add test cases * Resolve formatting issues * Resolve formatting issues --------- Co-authored-by: xxw <1990389406@qq.con>
Configuration menu - View commit details
-
Copy full SHA for ce14650 - Browse repository at this point
Copy the full SHA ce14650View commit details -
[Carver] Multi-Threads Compilation for Fast Auto Tuning (#156)
* [Carver] Multi-Threads Compilation for Fast Auto Tuning * Add progress bar for compilation * lint
Configuration menu - View commit details
-
Copy full SHA for 9789049 - Browse repository at this point
Copy the full SHA 9789049View commit details -
Refactor MLA decode kernel: Replace T.If with native Python if statem…
…ent (#162) Simplify the control flow in the MLA decode kernel by replacing TileLang's T.If construct with a standard Python if statement. This change improves code readability and maintains the existing logic for handling sequence length constraints during block-wise computation.
Configuration menu - View commit details
-
Copy full SHA for a00c797 - Browse repository at this point
Copy the full SHA a00c797View commit details
Commits on Mar 7, 2025
-
[Enhancement] Improve CUDA path detection (#157)
* [Typo] Fix formatting in installation instructions in README.md * [Enhancement] Improve CUDA path detection and update configuration handling * fix typo * remove IS_WINDOWS constant * lint fix * Improve error messages for CUDA detection failure * lint fix * lint fix * Fix .gitignore to correctly include venv directory
Configuration menu - View commit details
-
Copy full SHA for 25002e6 - Browse repository at this point
Copy the full SHA 25002e6View commit details -
[Refactor] Replace
T.thread_binding
withT.get_thread_binding
in ……examples and test cases (#163) * [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py - Modify roller hints generation using new TileLang Carver template and utility functions - Update get_roller_hints_from_func to handle None cases and improve return logic - Adjust DefaultPolicy to handle different codegen dictionary formats * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples - Move map_torch_type utility function to tilelang.utils.tensor - Remove unnecessary imports and improve code organization
Configuration menu - View commit details
-
Copy full SHA for 4342d34 - Browse repository at this point
Copy the full SHA 4342d34View commit details -
[Bugfix] Cast bool dtype into int8 in blocksparse examples (#167)
* [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py - Modify roller hints generation using new TileLang Carver template and utility functions - Update get_roller_hints_from_func to handle None cases and improve return logic - Adjust DefaultPolicy to handle different codegen dictionary formats * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples - Move map_torch_type utility function to tilelang.utils.tensor - Remove unnecessary imports and improve code organization * Refactor Native Sparse Attention Example with Enhanced Triton Kernel - Update parallel_nsa_fwd_kernel to support more flexible sparse attention computation - Add support for block counts and offsets in the Triton kernel - Modify kernel grid and computation logic for improved performance - Update example script to use naive_nsa_simple reference implementation - Improve type hints and kernel configuration
Configuration menu - View commit details
-
Copy full SHA for 5a63e65 - Browse repository at this point
Copy the full SHA 5a63e65View commit details -
[Example] Implement NSA Decode tilelang exampls (#168)
* [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py - Modify roller hints generation using new TileLang Carver template and utility functions - Update get_roller_hints_from_func to handle None cases and improve return logic - Adjust DefaultPolicy to handle different codegen dictionary formats * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples - Move map_torch_type utility function to tilelang.utils.tensor - Remove unnecessary imports and improve code organization * Refactor Native Sparse Attention Example with Enhanced Triton Kernel - Update parallel_nsa_fwd_kernel to support more flexible sparse attention computation - Add support for block counts and offsets in the Triton kernel - Modify kernel grid and computation logic for improved performance - Update example script to use naive_nsa_simple reference implementation - Improve type hints and kernel configuration * Add Native Sparse Attention Examples with Tilelang and Triton Implementations - Introduce new example scripts for native sparse attention: * example_tilelang_nsa_fwd.py: Forward pass implementation using TileLang * example_tilelang_nsa_decode.py: Decoding-specific sparse attention implementation * example_triton_nsa_fwd.py: Triton-based sparse attention forward pass - Update reference.py with naive implementations for sparse attention - Support different sparse attention scenarios including forward pass and inference - Add comprehensive testing and validation against reference implementations * lint fix
Configuration menu - View commit details
-
Copy full SHA for d8a06c0 - Browse repository at this point
Copy the full SHA d8a06c0View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v0.1.2...v0.1.2.post1