Hugging Face Gemma Recipes

🤗💎 Welcome! This repository contains minimal recipes to get started quickly with the Gemma family of models.

Note

Gemma 3n Conversational Fine tuning 2B on a Free Colab Notebook:

Gemma 3n Conversational Fine tuning 4B on a Free Colab Notebook:

Gemma 3n Multimodal Finetuning 2B/4B on a Free Colab Notebook:

Multimodal inference using Gemma 3n via pipeline:

Getting Started

To quickly run a Gemma 💎 model on your machine, install the latest version of timm (for the vision encoder) and 🤗 transformers to run inference, or if you want to fine tune it.

$ pip install -U -q transformers timm

Inference with pipeline

The easiest way to start using Gemma 3n is by using the pipeline abstraction in transformers:

import torch
from transformers import pipeline

pipe = pipeline(
   "image-text-to-text",
   model="google/gemma-3n-E4B-it", # "google/gemma-3n-E4B-it"
   device="cuda",
   torch_dtype=torch.bfloat16
)

messages = [
   {
       "role": "user",
       "content": [
           {"type": "image", "url": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/airplane.jpg"},
           {"type": "text", "text": "Describe this image"}
       ]
   }
]

output = pipe(text=messages, max_new_tokens=32)
print(output[0]["generated_text"][-1]["content"])

Detailed inference with transformers

Initialize the model and the processor from the Hub, and write the model_generation function that takes care of processing the prompts and running the inference on the model.

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

model_id = "google/gemma-3n-e4b-it" # google/gemma-3n-e2b-it
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(model_id).to(device)

def model_generation(model, messages):
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    )
    input_len = inputs["input_ids"].shape[-1]

    inputs = inputs.to(model.device, dtype=model.dtype)

    with torch.inference_mode():
        generation = model.generate(**inputs, max_new_tokens=32, disable_compile=False)
        generation = generation[:, input_len:]

    decoded = processor.batch_decode(generation, skip_special_tokens=True)
    print(decoded[0])

And then using calling it with our specific modality:

Text only

# Text Only

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What is the capital of France?"}
        ]
    }
]
model_generation(model, messages)

Interleaved with Audio

# Interleaved with Audio

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe the following speech segment in English:"},
            {"type": "audio", "audio": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/speech.wav"},
        ]
    }
]
model_generation(model, messages)

Interleaved with Image/Video

# Interleaved with Image

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/airplane.jpg"},
            {"type": "text", "text": "Describe this image."}
        ]
    }
]
model_generation(model, messages)

Inference

Gemma 3n

Notebooks

Multimodal inference using Gemma 3n via pipeline

Function Calling

Gemma 3n

Notebooks

Function Calling with Gemma 3n: Local File Reader

Fine Tuning

We include a series of notebook+scripts for fine tuning the models.

Gemma 3n

Notebooks

Scripts

Gemma 3

RAG

Gemma 3n

Retrieval-Augmented Generation with Gemma 3n

Before fine-tuning the model, ensure all dependencies are installed:

$ pip install -U -q -r requirements.txt

✨ Bonus: We've also experimented with adding object detection 🔍 capabilities to Gemma 3. You can explore that work in this dedicated repo.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github		.github
assets		assets
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hugging Face Gemma Recipes

Getting Started

Inference with pipeline

Detailed inference with transformers

Text only

Interleaved with Audio

Interleaved with Image/Video

Inference

Gemma 3n

Notebooks

Function Calling

Gemma 3n

Notebooks

Fine Tuning

Gemma 3n

Notebooks

Scripts

Gemma 3

RAG

Gemma 3n

About

Uh oh!

Releases

Packages

Contributors 8

Uh oh!

Languages

License

huggingface/huggingface-gemma-recipes

Folders and files

Latest commit

History

Repository files navigation

Hugging Face Gemma Recipes

Getting Started

Inference with pipeline

Detailed inference with transformers

Text only

Interleaved with Audio

Interleaved with Image/Video

Inference

Gemma 3n

Notebooks

Function Calling

Gemma 3n

Notebooks

Fine Tuning

Gemma 3n

Notebooks

Scripts

Gemma 3

RAG

Gemma 3n

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Uh oh!

Languages

Packages