-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for RecurrentGemma (Gemma with Griffin Architecture) #6564
Comments
this is the PR merged in gemma.cpp https://github.com/google/gemma.cpp/pull/136/files |
This issue was marked as stale, but shouldn’t supporting more efficient architectures be a priority? |
Will this or Griffin be in the upcoming Gemma 2 model(s)? I say "this or griffin" as the paper mentions a slight difference between recurrentgemma and griffin FWIW:
Seems like Griffin or a variant could be the "brand new architecture designed for breakthrough performance and efficiency" in today's Gemma 2 announcement, no? |
Now Google has released a 9B version of RecurrentGemma (arxiv link), which seems to score similarly to Gemma-7b, while supposedly being far more efficient: (source) Any chance llama.cpp can support RecurrentGemma @ggerganov? I wish I had the skill to implement it myself here, but I have no familiarity with llama.cpp's inner workings, I'm just a user of the software. |
Will be added, though we probably have to merge Jamba (#7531) and then see how to adapt |
Great news. People often forget about more efficient architectures, supporting this will speed so many things up! |
Is recurrent Gemma going to come to ollama or not? It uses Google's custom architecture named Griffin right? |
any news? are still unsupported.... |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Stalebot is an annoying concept. People hate it when commenters are leaving low effort comments like “bump”, but then stalebot closes the issue if no one does. RecurrentGemma still isn’t supported, sadly. |
@ggerganov how do we reopen this issue? |
Any updates? |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Feature Description
Please provide a detailed written description of what you were trying to do, and what you expected
llama.cpp
to do as an enhancement.Google’s newly released model, a hybrid architecture based on attention and hidden state: https://huggingface.co/google/recurrentgemma-2b
Motivation
Please provide a detailed written description of reasons why this feature is necessary and how it is useful to
llama.cpp
users.A good and open LLM with novel architecture
Possible Implementation
If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.
Unlike Jamba (#6372), the model is very small and can be used by most computers for inference.
Hybrid architecture is likely to be the trend of the future. Hope that llama.cpp can support it and other hybrid architectures (if possible).
The text was updated successfully, but these errors were encountered: