Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: tile-ai/tilelang
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.1.2
Choose a base ref
...
head repository: tile-ai/tilelang
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.1.2.post1
Choose a head ref
  • 8 commits
  • 37 files changed
  • 5 contributors

Commits on Mar 6, 2025

  1. Add libstdcxx-ng-12 to Dockerfiles for CUDA versions (#160)

    Update Dockerfiles for CUDA 118, 120, 121, 123, 124, 125, and 126 to install libstdcxx-ng-12 from conda-forge, ensuring consistent standard library support across different CUDA versions
    LeiWang1999 authored Mar 6, 2025
    Configuration menu
    Copy the full SHA
    5935e37 View commit details
    Browse the repository at this point in the history
  2. Add cpu jit with backend ctypes (#154)

    * Add cpu jit with backend ctypes
    
    * Resolve some lint issues
    
    * Apply PR feedback on head file and kernel example
    
    * Add test cases
    
    * Resolve formatting issues
    
    * Resolve formatting issues
    
    ---------
    
    Co-authored-by: xxw <1990389406@qq.con>
    xxw-keju and xxw authored Mar 6, 2025
    Configuration menu
    Copy the full SHA
    ce14650 View commit details
    Browse the repository at this point in the history
  3. [Carver] Multi-Threads Compilation for Fast Auto Tuning (#156)

    * [Carver] Multi-Threads Compilation for Fast Auto Tuning
    
    * Add progress bar for compilation
    
    * lint
    SiriusNEO authored Mar 6, 2025
    Configuration menu
    Copy the full SHA
    9789049 View commit details
    Browse the repository at this point in the history
  4. Refactor MLA decode kernel: Replace T.If with native Python if statem…

    …ent (#162)
    
    Simplify the control flow in the MLA decode kernel by replacing TileLang's T.If construct with a standard Python if statement. This change improves code readability and maintains the existing logic for handling sequence length constraints during block-wise computation.
    LeiWang1999 authored Mar 6, 2025
    Configuration menu
    Copy the full SHA
    a00c797 View commit details
    Browse the repository at this point in the history

Commits on Mar 7, 2025

  1. [Enhancement] Improve CUDA path detection (#157)

    * [Typo] Fix formatting in installation instructions in README.md
    
    * [Enhancement] Improve CUDA path detection and update configuration handling
    
    * fix typo
    
    * remove IS_WINDOWS constant
    
    * lint fix
    
    * Improve error messages for CUDA detection failure
    
    * lint fix
    
    * lint fix
    
    * Fix .gitignore to correctly include venv directory
    xwhzz authored Mar 7, 2025
    Configuration menu
    Copy the full SHA
    25002e6 View commit details
    Browse the repository at this point in the history
  2. [Refactor] Replace T.thread_binding with T.get_thread_binding in …

    …examples and test cases (#163)
    
    * [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation
    
    - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py
    - Modify roller hints generation using new TileLang Carver template and utility functions
    - Update get_roller_hints_from_func to handle None cases and improve return logic
    - Adjust DefaultPolicy to handle different codegen dictionary formats
    
    * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels
    
    - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files
    - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples
    - Move map_torch_type utility function to tilelang.utils.tensor
    - Remove unnecessary imports and improve code organization
    LeiWang1999 authored Mar 7, 2025
    Configuration menu
    Copy the full SHA
    4342d34 View commit details
    Browse the repository at this point in the history
  3. [Bugfix] Cast bool dtype into int8 in blocksparse examples (#167)

    * [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation
    
    - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py
    - Modify roller hints generation using new TileLang Carver template and utility functions
    - Update get_roller_hints_from_func to handle None cases and improve return logic
    - Adjust DefaultPolicy to handle different codegen dictionary formats
    
    * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels
    
    - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files
    - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples
    - Move map_torch_type utility function to tilelang.utils.tensor
    - Remove unnecessary imports and improve code organization
    
    * Refactor Native Sparse Attention Example with Enhanced Triton Kernel
    
    - Update parallel_nsa_fwd_kernel to support more flexible sparse attention computation
    - Add support for block counts and offsets in the Triton kernel
    - Modify kernel grid and computation logic for improved performance
    - Update example script to use naive_nsa_simple reference implementation
    - Improve type hints and kernel configuration
    LeiWang1999 authored Mar 7, 2025
    Configuration menu
    Copy the full SHA
    5a63e65 View commit details
    Browse the repository at this point in the history
  4. [Example] Implement NSA Decode tilelang exampls (#168)

    * [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation
    
    - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py
    - Modify roller hints generation using new TileLang Carver template and utility functions
    - Update get_roller_hints_from_func to handle None cases and improve return logic
    - Adjust DefaultPolicy to handle different codegen dictionary formats
    
    * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels
    
    - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files
    - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples
    - Move map_torch_type utility function to tilelang.utils.tensor
    - Remove unnecessary imports and improve code organization
    
    * Refactor Native Sparse Attention Example with Enhanced Triton Kernel
    
    - Update parallel_nsa_fwd_kernel to support more flexible sparse attention computation
    - Add support for block counts and offsets in the Triton kernel
    - Modify kernel grid and computation logic for improved performance
    - Update example script to use naive_nsa_simple reference implementation
    - Improve type hints and kernel configuration
    
    * Add Native Sparse Attention Examples with Tilelang and Triton Implementations
    
    - Introduce new example scripts for native sparse attention:
      * example_tilelang_nsa_fwd.py: Forward pass implementation using TileLang
      * example_tilelang_nsa_decode.py: Decoding-specific sparse attention implementation
      * example_triton_nsa_fwd.py: Triton-based sparse attention forward pass
    - Update reference.py with naive implementations for sparse attention
    - Support different sparse attention scenarios including forward pass and inference
    - Add comprehensive testing and validation against reference implementations
    
    * lint fix
    LeiWang1999 authored Mar 7, 2025
    Configuration menu
    Copy the full SHA
    d8a06c0 View commit details
    Browse the repository at this point in the history
Loading