Skip to content

Commit 96e2140

Browse files
authored
refactor fp4 masked gemm cute-dsl implementation and add manual cache (#1521)
<!-- .github/pull_request_template.md --> ## 📌 Description The kernel interface exposed in #1331 is not friendly for JIT caching and mix the compile time parameters and runtime arguments. This PR refactors these classes following these rules: 1. Compile time parameters should be passed as in `__init__` functions. 2. Use `@cute.kernel` to decorate device-side kernels. 3. Use `@cute.jit` to decorate host-side launcher functions (https://docs.nvidia.com/cutlass/media/docs/pythonDSL/cute_dsl_general/dsl_introduction.html), where we can call device-side kernels. 4. `cute.compile` accepts a host-side launcher function decorated by `@cute.jit`. ## 🔍 Related Issues #1519 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. -->
1 parent f1fd5c6 commit 96e2140

File tree

3 files changed

+262
-250
lines changed

3 files changed

+262
-250
lines changed

0 commit comments

Comments
 (0)