Serverless DeepSeek R1 Inference (FastAPI + Lambda SnapStart)

Deploy a fully serverless, low-latency LLM inference API for DeepSeek R1 Distill (or other GGUF models) using AWS Lambda, SnapStart, FastAPI, and llama-cpp-python.

🚀 Features

🔌 OpenAI-compatible /chat/completions FastAPI endpoint
⚡ SnapStart-enabled Lambda for ~1–2s cold starts
🔁 Streaming responses via Server-Sent Events
⬇️ Model pulled from S3 using memfd (fast in-memory loading)
🔐 Optional IAM-based authentication (SigV4)

🧱 Architecture

FastAPI app deployed as AWS Lambda Function URL
Custom Lambda Layer with llama-cpp-python
S3 bucket stores .gguf model
Lambda Web Adapter streams responses

📦 Prerequisites

AWS SAM CLI
Python 3.12 & Docker
AWS account + s3:GetObject & lambda:InvokeFunctionUrl permissions
Model (e.g. DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf)

☁️ Deployment

1. Upload Model

wget https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/resolve/main/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
aws s3 mb s3://YOUR_BUCKET
aws s3 cp DeepSeek-R1*.gguf s3://YOUR_BUCKET/

2. Build + Deploy

sam build
sam deploy --guided

Configure:

MODEL_BUCKET
MODEL_KEY

💬 Client Usage

pip install requests python-dotenv boto3
echo "CHAT_API_BASE=https://xxxx.lambda-url.region.on.aws" > .env

python client.py                # interactive CLI
python client.py --temperature 0.7 --max-tokens 512

Shortcuts: /new new conversation /quit exit

🛠 Local Development

python -m venv .venv
source .venv/bin/activate
cd app && pip install -r requirements.txt
uvicorn main:app --reload

🧽 Cleanup

sam delete

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
app		app
img		img
layers/llama-cpp		layers/llama-cpp
models		models
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
client.py		client.py
template.yaml		template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Serverless DeepSeek R1 Inference (FastAPI + Lambda SnapStart)

🚀 Features

🧱 Architecture

📦 Prerequisites

☁️ Deployment

💬 Client Usage

🛠 Local Development

🧽 Cleanup

About

Uh oh!

Releases

Packages

Languages

License

Sid330s/LLM-Zero

Folders and files

Latest commit

History

Repository files navigation

Serverless DeepSeek R1 Inference (FastAPI + Lambda SnapStart)

🚀 Features

🧱 Architecture

📦 Prerequisites

☁️ Deployment

💬 Client Usage

🛠 Local Development

🧽 Cleanup

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages