-
Notifications
You must be signed in to change notification settings - Fork 453
Commit 96e2140
authored
refactor fp4 masked gemm cute-dsl implementation and add manual cache (#1521)
<!-- .github/pull_request_template.md -->
## 📌 Description
The kernel interface exposed in #1331 is not friendly for JIT caching
and mix the compile time parameters and runtime arguments. This PR
refactors these classes following these rules:
1. Compile time parameters should be passed as in `__init__` functions.
2. Use `@cute.kernel` to decorate device-side kernels.
3. Use `@cute.jit` to decorate host-side launcher functions
(https://docs.nvidia.com/cutlass/media/docs/pythonDSL/cute_dsl_general/dsl_introduction.html),
where we can call device-side kernels.
4. `cute.compile` accepts a host-side launcher function decorated by
`@cute.jit`.
## 🔍 Related Issues
#1519
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [ ] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [ ] I have installed the hooks with `pre-commit install`.
- [ ] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [ ] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->1 parent f1fd5c6 commit 96e2140Copy full SHA for 96e2140
File tree
Expand file treeCollapse file tree
3 files changed
+262
-250
lines changedFilter options
- flashinfer/cute_dsl
- tests
Expand file treeCollapse file tree
3 files changed
+262
-250
lines changed
0 commit comments