Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for RecurrentGemma (Gemma with Griffin Architecture) #6564

Open
4 tasks done
TechxGenus opened this issue Apr 9, 2024 · 12 comments
Open
4 tasks done

Support for RecurrentGemma (Gemma with Griffin Architecture) #6564

TechxGenus opened this issue Apr 9, 2024 · 12 comments
Labels
enhancement New feature or request model Model specific stale

Comments

@TechxGenus
Copy link

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Please provide a detailed written description of what you were trying to do, and what you expected llama.cpp to do as an enhancement.

Google’s newly released model, a hybrid architecture based on attention and hidden state: https://huggingface.co/google/recurrentgemma-2b

Motivation

Please provide a detailed written description of reasons why this feature is necessary and how it is useful to llama.cpp users.

A good and open LLM with novel architecture

Possible Implementation

If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.

Unlike Jamba (#6372), the model is very small and can be used by most computers for inference.
Hybrid architecture is likely to be the trend of the future. Hope that llama.cpp can support it and other hybrid architectures (if possible).

@TechxGenus TechxGenus added the enhancement New feature or request label Apr 9, 2024
@vasileermicioi
Copy link

this is the PR merged in gemma.cpp https://github.com/google/gemma.cpp/pull/136/files

@phymbert phymbert added the model Model specific label Apr 9, 2024
@github-actions github-actions bot added the stale label May 10, 2024
@coder543
Copy link

This issue was marked as stale, but shouldn’t supporting more efficient architectures be a priority?

@fat-tire
Copy link
Contributor

Will this or Griffin be in the upcoming Gemma 2 model(s)? I say "this or griffin" as the paper mentions a slight difference between recurrentgemma and griffin FWIW:

We make only a single modification to the Griffin architecture (De et al., 2024), which is to multiply the input embeddings by a constant equal to the square root of model width. The input and output embeddings are tied, but this factor is not applied to the output.

Seems like Griffin or a variant could be the "brand new architecture designed for breakthrough performance and efficiency" in today's Gemma 2 announcement, no?

@github-actions github-actions bot removed the stale label May 15, 2024
@coder543
Copy link

coder543 commented Jun 11, 2024

Now Google has released a 9B version of RecurrentGemma (arxiv link), which seems to score similarly to Gemma-7b, while supposedly being far more efficient:

max_throughput

(source)

Any chance llama.cpp can support RecurrentGemma @ggerganov? I wish I had the skill to implement it myself here, but I have no familiarity with llama.cpp's inner workings, I'm just a user of the software.

@ggerganov
Copy link
Owner

Will be added, though we probably have to merge Jamba (#7531) and then see how to adapt llama_cache to support the new Griffin layers

@DuckyBlender
Copy link

Great news. People often forget about more efficient architectures, supporting this will speed so many things up!

@Meshwa428
Copy link

Is recurrent Gemma going to come to ollama or not?

It uses Google's custom architecture named Griffin right?

@0wwafa
Copy link

0wwafa commented Jul 11, 2024

any news?
google/recurrentgemma-2b-it
and
google/recurrentgemma-9b-it

are still unsupported....

@github-actions github-actions bot added the stale label Aug 11, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@coder543
Copy link

Stalebot is an annoying concept. People hate it when commenters are leaving low effort comments like “bump”, but then stalebot closes the issue if no one does.

RecurrentGemma still isn’t supported, sadly.

@coder543
Copy link

@ggerganov how do we reopen this issue?

@Meshwa428
Copy link

Any updates?

@Green-Sky Green-Sky removed the stale label Aug 26, 2024
@Green-Sky Green-Sky reopened this Aug 26, 2024
@github-actions github-actions bot added the stale label Sep 26, 2024
@compilade compilade removed the stale label Oct 3, 2024
@github-actions github-actions bot added the stale label Nov 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model Model specific stale
Projects
None yet
Development

No branches or pull requests