This project provides a robust REST API built with FastAPI and Docker to manage and interact with Google's Gemma Translate AI Models for AI string translations on-device.
- Translation services
- Support for multiple models
- Automatic API Docs: Interactive API documentation powered by Swagger UI and ReDoc.
- FastAPI for the core web framework.
- Uvicorn as the ASGI server.
- Docker for containerization and easy deployment.
- Pydantic for data validation and settings management.
- Docker Desktop
- Conda (or another Python environment manager)
- Python 3.10+
Create and activate a Conda environment:
conda create -n translate python=3.11
conda activate translateInstall the hf tool to download the models:
pip install "fastapi[standard]" "uvicorn[standard]" httpx llama-cpp-python huggingface_hub
We're going to fetch the GGUF model fromats from these repositories:
https://huggingface.co/bullerwins/translategemma-27b-it-GGUF
https://huggingface.co/bullerwins/translategemma-12b-it-GGUF
https://huggingface.co/bullerwins/translategemma-4b-it-GGUF
For comparison, here's a table of all above GGUF models and their quantitizations:
| Model Family | Quantization Level | File Name | File Size | Quality Level |
|---|---|---|---|---|
| Gemma 4B | Q3_K_L | translategemma-4b-it-Q3_K_L.gguf | 2.24 GB | Low-Medium |
| Q4_K_S | translategemma-4b-it-Q4_K_S.gguf | 2.38 GB | Medium | |
| Q4_K_M | translategemma-4b-it-Q4_K_M.gguf | 2.49 GB | Balanced | |
| Q5_K_S | translategemma-4b-it-Q5_K_S.gguf | 2.76 GB | High | |
| Q5_K_M | translategemma-4b-it-Q5_K_M.gguf | 2.83 GB | High+ | |
| Q6_K | translategemma-4b-it-Q6_K.gguf | 3.19 GB | Near Lossless | |
| Q8_0 | translategemma-4b-it-Q8_0.gguf | 4.13 GB | Reference | |
| Gemma 12B | Q3_K_L | translategemma-12b-it-Q3_K_L.gguf | 6.48 GB | Medium |
| Q4_K_S | translategemma-12b-it-Q4_K_S.gguf | 6.94 GB | High | |
| Q4_K_M | translategemma-12b-it-Q4_K_M.gguf | 7.30 GB | High (Recommended) | |
| Q5_K_S | translategemma-12b-it-Q5_K_S.gguf | 8.23 GB | Very High | |
| Q5_K_M | translategemma-12b-it-Q5_K_M.gguf | 8.45 GB | Very High | |
| Q6_K | translategemma-12b-it-Q6_K.gguf | 9.66 GB | Near Lossless | |
| Q8_0 | translategemma-12b-it-Q8_0.gguf | 12.5 GB | Reference | |
| Gemma 27B | Q3_K_L | translategemma-27b-it-Q3_K_L.gguf | 14.5 GB | High |
| Q4_K_S | translategemma-27b-it-Q4_K_S.gguf | 15.7 GB | Very High | |
| Q4_K_M | translategemma-27b-it-Q4_K_M.gguf | 16.5 GB | Very High (Recommended) | |
| Q5_K_S | translategemma-27b-it-Q5_K_S.gguf | 18.8 GB | Excellent | |
| Q5_K_M | translategemma-27b-it-Q5_K_M.gguf | 19.3 GB | Excellent | |
| Q6_K | translategemma-27b-it-Q6_K.gguf | 22.2 GB | Near Lossless | |
| Q8_0 | translategemma-27b-it-Q8_0.gguf | 28.7 GB | Reference |
Download one of the following Gemma Translate models:
Gemma 4B:
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q8_0.gguf --local-dir app/models/translategemma-4b-it-Q8_0
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q6_K.gguf --local-dir app/models/translategemma-4b-it-Q6_K
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q5_K_S.gguf --local-dir app/models/translategemma-4b-it-Q5_K_S
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q5_K_M.gguf --local-dir app/models/translategemma-4b-it-Q5_K_M
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q4_K_S.gguf --local-dir app/models/translategemma-4b-it-Q4_K_S
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q4_K_M.gguf --local-dir app/models/translategemma-4b-it-Q4_K_M
hf download bullerwins/translategemma-4b-it-GGUF translategemma-4b-it-Q3_K_L.gguf --local-dir app/models/translategemma-4b-it-Q3_K_L
Gemma 12B:
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q8_0.gguf --local-dir app/models/translategemma-12b-it-Q8_0
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q6_K.gguf --local-dir app/models/translategemma-12b-it-Q6_K
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q5_K_S.gguf --local-dir app/models/translategemma-12b-it-Q5_K_S
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q5_K_M.gguf --local-dir app/models/translategemma-12b-it-Q5_K_M
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q4_K_S.gguf --local-dir app/models/translategemma-12b-it-Q4_K_S
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q4_K_M.gguf --local-dir app/models/translategemma-12b-it-Q4_K_M
hf download bullerwins/translategemma-12b-it-GGUF translategemma-12b-it-Q3_K_L.gguf --local-dir app/models/translategemma-12b-it-Q3_K_L
Gemma 27B:
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q8_0.gguf --local-dir app/models/translategemma-27b-it-Q8_0
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q6_K.gguf --local-dir app/models/translategemma-27b-it-Q6_K
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q5_K_S.gguf --local-dir app/models/translategemma-27b-it-Q5_K_S
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q5_K_M.gguf --local-dir app/models/translategemma-27b-it-Q5_K_M
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q4_K_S.gguf --local-dir app/models/translategemma-27b-it-Q4_K_S
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q4_K_M.gguf --local-dir app/models/translategemma-27b-it-Q4_K_M
hf download bullerwins/translategemma-27b-it-GGUF translategemma-27b-it-Q3_K_L.gguf --local-dir app/models/translategemma-27b-it-Q3_K_L
This is the easiest and recommended way to run the application.
-
Build the Docker image:
docker build -t fastapi_gemma_translate . -
Run the Docker container: This command runs the container in detached mode (
-d) and maps port 8080 on your host to port 8080 in the container.docker run -d --name ai_container_cpu -p 127.0.0.1:8080:8080 -v C:/Users/username/Desktop/git/fastapi-gemma-translate/_models:/code/models fastapi_gemma_translate
Alternatively you can pull and run my docker image:
-
Pull the Docker image:
docker image pull grctest/fastapi_gemma_translate
-
Run the Docker container: This command runs the container in detached mode (
-d) and maps port 8080 on your host to port 8080 in the container.docker run -d --name ai_container_cpu -p 127.0.0.1:8080:8080 -v C:/Users/username/Desktop/git/fastapi-gemma-translate/_models:/code/models grctest/fastapi_gemma_translate
The above were for CPU-only mode, if you want Nvidia CUDA GPU support you'll need to use the CudaDockerfile, either by:
-
Building the Docker image:
docker build -t fastapi_gemma_translate_cuda:legacy -f LegacyCudaDockerfile . docker build -t fastapi_gemma_translate_cuda:mainstream -f MainstreamCudaDockerfile . docker build -t fastapi_gemma_translate_cuda:future -f FutureCudaDockerfile .
docker build -t grctest/fastapi_gemma_translate_cuda:legacy -f LegacyCudaDockerfile . docker build -t grctest/fastapi_gemma_translate_cuda:mainstream -f MainstreamCudaDockerfile . docker build -t grctest/fastapi_gemma_translate_cuda:future -f FutureCudaDockerfile .
-
Pulling the Docker image:
docker image pull grctest/fastapi_gemma_translate_cuda:legacy
Then you need to use the GPU flag:
docker run --gpus all -d --name ai_container_gpu -p 127.0.0.1:8080:8080 -v C:/Users/username/Desktop/git/fastapi-gemma-translate/_models:/code/models -e LLAMA_N_GPU_LAYERS=-1 fastapi_gemma_translate_cuda:legacy
docker run --gpus all -d --name ai_container_cuda -p 127.0.0.1:8080:8080 -v C:/Users/username/Desktop/git/fastapi-gemma-translate/_models:/code/models -e LLAMA_N_GPU_LAYERS=-1 grctest/fastapi_gemma_translate_cuda:legacy
Note: The C:/Users/username/Desktop/git/fastapi-gemma-translate/_models folder can be replaced by the path you've downloaded the gguf folders+files to.
Container GPU variants:
-
Legacy: Pascal / 10xx Nvidia cards
-
Mainstream: Turing to Ada / 20xx, 30xx, 40xx, A100 Nvidia cards
-
Future: Blackwell / 50xx Nvidia cards
For development, you can run the application directly with Uvicorn, which enables auto-reloading.
uvicorn app.main:app --host 0.0.0.0 --port 8080 --reloadOnce the server is running, you can access the interactive API documentation:
- Swagger UI: http://127.0.0.1:8080/docs
- ReDoc: http://127.0.0.1:8080/redoc
This project is licensed under the MIT License. See the LICENSE file for details.