Skip to content

huggingface/huggingface-gemma-recipes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

54 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Hugging Face Gemma Recipes

repository thumbnail

πŸ€—πŸ’Ž Welcome! This repository contains minimal recipes to get started quickly with the Gemma family of models.

Note

Gemma 3n Conversational Fine tuning 2B on a Free Colab Notebook: Open In Colab

Gemma 3n Conversational Fine tuning 4B on a Free Colab Notebook: Open In Colab

Gemma 3n Multimodal Finetuning 2B/4B on a Free Colab Notebook: Open In Colab

Multimodal inference using Gemma 3n via pipeline: Open In Colab

Getting Started

To quickly run a Gemma πŸ’Ž model on your machine, install the latest version of timm (for the vision encoder) and πŸ€— transformers to run inference, or if you want to fine tune it.

$ pip install -U -q transformers timm

Inference with pipeline

The easiest way to start using Gemma 3n is by using the pipeline abstraction in transformers:

import torch
from transformers import pipeline

pipe = pipeline(
   "image-text-to-text",
   model="google/gemma-3n-E4B-it", # "google/gemma-3n-E4B-it"
   device="cuda",
   torch_dtype=torch.bfloat16
)

messages = [
   {
       "role": "user",
       "content": [
           {"type": "image", "url": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/airplane.jpg"},
           {"type": "text", "text": "Describe this image"}
       ]
   }
]

output = pipe(text=messages, max_new_tokens=32)
print(output[0]["generated_text"][-1]["content"])

Detailed inference with transformers

Initialize the model and the processor from the Hub, and write the model_generation function that takes care of processing the prompts and running the inference on the model.

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

model_id = "google/gemma-3n-e4b-it" # google/gemma-3n-e2b-it
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(model_id).to(device)

def model_generation(model, messages):
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    )
    input_len = inputs["input_ids"].shape[-1]

    inputs = inputs.to(model.device, dtype=model.dtype)

    with torch.inference_mode():
        generation = model.generate(**inputs, max_new_tokens=32, disable_compile=False)
        generation = generation[:, input_len:]

    decoded = processor.batch_decode(generation, skip_special_tokens=True)
    print(decoded[0])

And then using calling it with our specific modality:

Text only

# Text Only

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What is the capital of France?"}
        ]
    }
]
model_generation(model, messages)

Interleaved with Audio

# Interleaved with Audio

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe the following speech segment in English:"},
            {"type": "audio", "audio": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/speech.wav"},
        ]
    }
]
model_generation(model, messages)

Interleaved with Image/Video

# Interleaved with Image

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/airplane.jpg"},
            {"type": "text", "text": "Describe this image."}
        ]
    }
]
model_generation(model, messages)

Inference

Gemma 3n

Notebooks

Function Calling

Gemma 3n

Notebooks

Fine Tuning

We include a series of notebook+scripts for fine tuning the models.

Gemma 3n

Notebooks

Scripts

Gemma 3

RAG

Gemma 3n

Before fine-tuning the model, ensure all dependencies are installed:

$ pip install -U -q -r requirements.txt

✨ Bonus: We've also experimented with adding object detection πŸ” capabilities to Gemma 3. You can explore that work in this dedicated repo.

About

Inference, Fine Tuning and many more recipes with Gemma family of models

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 8