-
Notifications
You must be signed in to change notification settings - Fork 29.5k
Add recurrent gemma #30143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add recurrent gemma #30143
Conversation
Changing how the config specifies the architecture.
Fixed a few typos.
Still unclear on the cache?
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'. 2. `cache_position` not passed
Adding tests that work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok great, LGTM!
src/transformers/models/recurrent_gemma/configuration_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/configuration_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/configuration_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/configuration_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/configuration_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/configuration_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Lysandre Debut <hi@lysand.re>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice 🔥 Thanks for implementing!
Only thing that has to be addressed is the config saving and some outstanding todo comments dotted around
src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/configuration_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/configuration_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/configuration_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py
Outdated
Show resolved
Hide resolved
…a_to_hf.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
…mers into add-recurrent-gemma
Just making sure the slow generations are correct, and merging! |
return hidden_states | ||
|
||
# TODO refactor | ||
def _rnn_scan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I work on https://github.com/proger/accelerated-scan which is a package with CUDA/Triton kernels. It can speed up training and prompt reading for Recurrent Gemma. Would you consider a patch for the model that uses Accelerated Scan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure but would love to share if you have a repo on the hub that integrates this? 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip!
What does this PR do?
Adds support for recurrent gemma