[KernelGen][Nvidia] Add rnn_relu operator with Triton kernel#3558
Open
XDYuanzhuLee wants to merge 3 commits into
Open
[KernelGen][Nvidia] Add rnn_relu operator with Triton kernel#3558XDYuanzhuLee wants to merge 3 commits into
XDYuanzhuLee wants to merge 3 commits into
Conversation
Add Triton kernel implementation for single-layer unidirectional RNN with ReLU activation. Includes accuracy tests and performance benchmark. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines
+14
to
+20
| @triton.autotune( | ||
| configs=[ | ||
| triton.Config({"BLOCK_SIZE": 128}, num_stages=4, num_warps=4), | ||
| triton.Config({"BLOCK_SIZE": 256}, num_stages=4, num_warps=4), | ||
| triton.Config({"BLOCK_SIZE": 512}, num_stages=4, num_warps=4), | ||
| triton.Config({"BLOCK_SIZE": 1024}, num_stages=4, num_warps=4), | ||
| ], |
Collaborator
There was a problem hiding this comment.
move the autotune config into the config file pls
Comment on lines
+443
to
+446
| # Unsupported configuration | ||
| raise NotImplementedError( | ||
| "GEMS RNN_RELU only supports single-layer " "unidirectional without dropout" | ||
| ) |
Collaborator
There was a problem hiding this comment.
We need a more general implementation.
Schopenhauer-loves-Hegel
requested changes
May 28, 2026
Schopenhauer-loves-Hegel
requested changes
May 29, 2026
Collaborator
Schopenhauer-loves-Hegel
left a comment
There was a problem hiding this comment.
FlagGems Automated Review
Operator: rnn_relu
Verdict: REQUEST_CHANGES
Summary
- Errors: 1
- Warnings: 0
- Suggestions: 0
Issues (no specific line)
- [ERROR]
UNKNOWN: Commit 中包含 'Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com'。FlagGems 不允许在 commit message 中使用 Co-Authored-By — Fix: 使用 git rebase -i 修改 commit message,删除 Co-Authored-By 行
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Triton kernel for
rnn_relu. Triton kernel implementation forrnn_relu.Testing
to_reference(inp, True)Performance
Test command:
pytest benchmark/test_rnn_relu.py --level core(NVIDIA H20)Multi-backend Testing
Files Changed
src/flag_gems/ops/rnn_relu.py: Triton kernel implementationtests/test_rnn_relu.py: Accuracy testbenchmark/test_rnn_relu.py: Performance benchmarksrc/flag_gems/ops/__init__.py: Register import and__all__src/flag_gems/__init__.py: Register to_FULL_CONFIGconf/operators.yaml: Add operator entry (kind: Math, stage: alpha 5.1)