fastapi-gemma-translate

This project provides a robust REST API built with FastAPI and Docker to manage and interact with Google's Gemma Translate AI Models for AI string translations on-device.

Key Features

Translation services
Support for multiple models
Automatic API Docs: Interactive API documentation powered by Swagger UI and ReDoc.

Technology Stack

FastAPI for the core web framework.
Uvicorn as the ASGI server.
Docker for containerization and easy deployment.
Pydantic for data validation and settings management.

Getting Started

Prerequisites

Docker Desktop
Conda (or another Python environment manager)
Python 3.10+

1. Set Up the Python Environment

Create and activate a Conda environment:

conda create -n translate python=3.11
conda activate translate

Install the hf tool to download the models:

pip install "fastapi[standard]" "uvicorn[standard]" httpx llama-cpp-python huggingface_hub

We're going to fetch the GGUF model fromats from these repositories:

https://huggingface.co/bullerwins/translategemma-27b-it-GGUF
https://huggingface.co/bullerwins/translategemma-12b-it-GGUF
https://huggingface.co/bullerwins/translategemma-4b-it-GGUF

For comparison, here's a table of all above GGUF models and their quantitizations:

Model Family	Quantization Level	File Name	File Size	Quality Level
Gemma 4B	Q3_K_L	translategemma-4b-it-Q3_K_L.gguf	2.24 GB	Low-Medium
	Q4_K_S	translategemma-4b-it-Q4_K_S.gguf	2.38 GB	Medium
	Q4_K_M	translategemma-4b-it-Q4_K_M.gguf	2.49 GB	Balanced
	Q5_K_S	translategemma-4b-it-Q5_K_S.gguf	2.76 GB	High
	Q5_K_M	translategemma-4b-it-Q5_K_M.gguf	2.83 GB	High+
	Q6_K	translategemma-4b-it-Q6_K.gguf	3.19 GB	Near Lossless
	Q8_0	translategemma-4b-it-Q8_0.gguf	4.13 GB	Reference
Gemma 12B	Q3_K_L	translategemma-12b-it-Q3_K_L.gguf	6.48 GB	Medium
	Q4_K_S	translategemma-12b-it-Q4_K_S.gguf	6.94 GB	High
	Q4_K_M	translategemma-12b-it-Q4_K_M.gguf	7.30 GB	High (Recommended)
	Q5_K_S	translategemma-12b-it-Q5_K_S.gguf	8.23 GB	Very High
	Q5_K_M	translategemma-12b-it-Q5_K_M.gguf	8.45 GB	Very High
	Q6_K	translategemma-12b-it-Q6_K.gguf	9.66 GB	Near Lossless
	Q8_0	translategemma-12b-it-Q8_0.gguf	12.5 GB	Reference
Gemma 27B	Q3_K_L	translategemma-27b-it-Q3_K_L.gguf	14.5 GB	High
	Q4_K_S	translategemma-27b-it-Q4_K_S.gguf	15.7 GB	Very High
	Q4_K_M	translategemma-27b-it-Q4_K_M.gguf	16.5 GB	Very High (Recommended)
	Q5_K_S	translategemma-27b-it-Q5_K_S.gguf	18.8 GB	Excellent
	Q5_K_M	translategemma-27b-it-Q5_K_M.gguf	19.3 GB	Excellent
	Q6_K	translategemma-27b-it-Q6_K.gguf	22.2 GB	Near Lossless
	Q8_0	translategemma-27b-it-Q8_0.gguf	28.7 GB	Reference

Download one of the following Gemma Translate models:

Gemma 4B:

hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q8_0.gguf --local-dir app/models/translategemma-4b-it-Q8_0
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q6_K.gguf --local-dir app/models/translategemma-4b-it-Q6_K
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q5_K_S.gguf --local-dir app/models/translategemma-4b-it-Q5_K_S
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q5_K_M.gguf --local-dir app/models/translategemma-4b-it-Q5_K_M
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q4_K_S.gguf --local-dir app/models/translategemma-4b-it-Q4_K_S
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q4_K_M.gguf --local-dir app/models/translategemma-4b-it-Q4_K_M
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q3_K_L.gguf --local-dir app/models/translategemma-4b-it-Q3_K_L

Gemma 12B:

hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q8_0.gguf --local-dir app/models/translategemma-12b-it-Q8_0
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q6_K.gguf --local-dir app/models/translategemma-12b-it-Q6_K
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q5_K_S.gguf --local-dir app/models/translategemma-12b-it-Q5_K_S
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q5_K_M.gguf --local-dir app/models/translategemma-12b-it-Q5_K_M
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q4_K_S.gguf --local-dir app/models/translategemma-12b-it-Q4_K_S
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q4_K_M.gguf --local-dir app/models/translategemma-12b-it-Q4_K_M
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q3_K_L.gguf --local-dir app/models/translategemma-12b-it-Q3_K_L

Gemma 27B:

hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q8_0.gguf --local-dir app/models/translategemma-27b-it-Q8_0
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q6_K.gguf --local-dir app/models/translategemma-27b-it-Q6_K
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q5_K_S.gguf --local-dir app/models/translategemma-27b-it-Q5_K_S
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q5_K_M.gguf --local-dir app/models/translategemma-27b-it-Q5_K_M
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q4_K_S.gguf --local-dir app/models/translategemma-27b-it-Q4_K_S
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q4_K_M.gguf --local-dir app/models/translategemma-27b-it-Q4_K_M
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q3_K_L.gguf --local-dir app/models/translategemma-27b-it-Q3_K_L

Running the Application

Using Docker (Recommended)

This is the easiest and recommended way to run the application.

Build the Docker image:

docker build -t fastapi_gemma_translate .

Run the Docker container: This command runs the container in detached mode (-d) and maps port 8080 on your host to port 8080 in the container.

docker run -d --name ai_container_cpu -p 127.0.0.1:8080:8080 -v C:/Users/username/Desktop/git/fastapi-gemma-translate/_models:/code/models fastapi_gemma_translate

Alternatively you can pull and run my docker image:

Pull the Docker image:

docker image pull grctest/fastapi_gemma_translate

Run the Docker container: This command runs the container in detached mode (-d) and maps port 8080 on your host to port 8080 in the container.

docker run -d --name ai_container_cpu -p 127.0.0.1:8080:8080 -v C:/Users/username/Desktop/git/fastapi-gemma-translate/_models:/code/models grctest/fastapi_gemma_translate

The above were for CPU-only mode, if you want Nvidia CUDA GPU support you'll need to use the CudaDockerfile, either by:

Building the Docker image:

docker build -t fastapi_gemma_translate_cuda:legacy -f LegacyCudaDockerfile .
docker build -t fastapi_gemma_translate_cuda:mainstream -f MainstreamCudaDockerfile .
docker build -t fastapi_gemma_translate_cuda:future -f FutureCudaDockerfile .

docker build -t grctest/fastapi_gemma_translate_cuda:legacy -f LegacyCudaDockerfile .
docker build -t grctest/fastapi_gemma_translate_cuda:mainstream -f MainstreamCudaDockerfile .
docker build -t grctest/fastapi_gemma_translate_cuda:future -f FutureCudaDockerfile .

Pulling the Docker image:

docker image pull grctest/fastapi_gemma_translate_cuda:legacy

Then you need to use the GPU flag:

docker run --gpus all -d --name ai_container_gpu -p 127.0.0.1:8080:8080 -v C:/Users/username/Desktop/git/fastapi-gemma-translate/_models:/code/models -e LLAMA_N_GPU_LAYERS=-1 fastapi_gemma_translate_cuda:legacy

docker run --gpus all -d --name ai_container_cuda -p 127.0.0.1:8080:8080 -v C:/Users/username/Desktop/git/fastapi-gemma-translate/_models:/code/models -e LLAMA_N_GPU_LAYERS=-1 grctest/fastapi_gemma_translate_cuda:legacy

Note: The C:/Users/username/Desktop/git/fastapi-gemma-translate/_models folder can be replaced by the path you've downloaded the gguf folders+files to.

Container GPU variants:

Legacy: Pascal / 10xx Nvidia cards
Mainstream: Turing to Ada / 20xx, 30xx, 40xx, A100 Nvidia cards
Future: Blackwell / 50xx Nvidia cards

Local Development

For development, you can run the application directly with Uvicorn, which enables auto-reloading.

uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload

API Usage

Once the server is running, you can access the interactive API documentation:

Swagger UI: http://127.0.0.1:8080/docs
ReDoc: http://127.0.0.1:8080/redoc

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastapi-gemma-translate

Key Features

Technology Stack

Getting Started

Prerequisites

1. Set Up the Python Environment

Running the Application

Using Docker (Recommended)

Local Development

API Usage

License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
app		app
.gitignore		.gitignore
Dockerfile		Dockerfile
FutureCudaDockerfile		FutureCudaDockerfile
LICENSE		LICENSE
LegacyCudaDockerfile		LegacyCudaDockerfile
MainstreamCudaDockerfile		MainstreamCudaDockerfile
README.md		README.md
requirements.txt		requirements.txt

License

grctest/fastapi-gemma-translate

Folders and files

Latest commit

History

Repository files navigation

fastapi-gemma-translate

Key Features

Technology Stack

Getting Started

Prerequisites

1. Set Up the Python Environment

Running the Application

Using Docker (Recommended)

Local Development

API Usage

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages