Skip to content

Add recurrent gemma #30143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 97 commits into from
Apr 10, 2024
Merged

Add recurrent gemma #30143

merged 97 commits into from
Apr 10, 2024

Conversation

ArthurZucker
Copy link
Collaborator

What does this PR do?

Adds support for recurrent gemma

molbap and others added 30 commits March 4, 2024 15:58
Changing how the config specifies the architecture.
Fixed a few typos.
Still unclear on the cache?
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
2. `cache_position` not passed
Adding tests that work.
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great, LGTM!

ArthurZucker and others added 2 commits April 10, 2024 10:13
Co-authored-by: Lysandre Debut <hi@lysand.re>
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice 🔥 Thanks for implementing!

Only thing that has to be addressed is the config saving and some outstanding todo comments dotted around

@ArthurZucker
Copy link
Collaborator Author

Just making sure the slow generations are correct, and merging!

@ArthurZucker ArthurZucker merged commit 0fe4405 into main Apr 10, 2024
@ArthurZucker ArthurZucker deleted the add-recurrent-gemma branch April 10, 2024 14:59
return hidden_states

# TODO refactor
def _rnn_scan(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I work on https://github.com/proger/accelerated-scan which is a package with CUDA/Triton kernels. It can speed up training and prompt reading for Recurrent Gemma. Would you consider a patch for the model that uses Accelerated Scan?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure but would love to share if you have a repo on the hub that integrates this? 🤗

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants