A memory-efficient alternative to Docker's GenAI Stack, designed to run within 6-8GB RAM. Features an educational Learning Mode that visualizes the RAG pipeline in real-time.
| Feature | Docker GenAI Stack | Lightweight GenAI Stack |
|---|---|---|
| Min RAM | 20GB+ | 6-8GB |
| Vector DB | Neo4j (heavy) | ChromaDB (light) |
| Default Model | Llama2 7B (~4.5GB) | TinyLlama 1.1B (~600MB) |
| Embeddings | Sentence Transformers | nomic-embed-text (768-dim) |
| Framework | LangChain + Streamlit | LangChain + Streamlit |
| Features | GraphRAG, Knowledge Graph | Simple RAG + Learning Mode |
Component | RAM Usage
-----------------------|----------
Ollama + tinyllama:1.1b| ~1-2GB
nomic-embed-text | ~300MB
ChromaDB | ~256-512MB
Streamlit App | ~512MB-1GB
OS + Docker | ~1-2GB
-----------------------|----------
Total | ~4-6GB
# Create directory
mkdir lightweight-genai-stack && cd lightweight-genai-stack
# Copy the files from this project# Start all services
docker compose up -d
# Watch the logs (model download takes a few minutes)
docker compose logs -f model-pullerOpen http://localhost:8501 in your browser.
lightweight-genai-stack/
βββ docker-compose.yml # Main orchestration
βββ .env.example # Configuration template
βββ README.md
βββ WORKSHOP.md # Detailed workshop guide
βββ chroma_stats.py # ChromaDB statistics script
βββ rag_query.py # RAG query testing script
βββ test_chroma.py # Full ChromaDB test suite
βββ app/
βββ Dockerfile # Streamlit app image
βββ requirements.txt # Python dependencies
βββ main.py # RAG application (Learning Mode)
Edit docker-compose.yml or create .env:
| Available RAM | Recommended Model | Notes |
|---|---|---|
| 6GB | tinyllama:1.1b |
Default - Fastest, ~600MB |
| 8GB | phi3:mini |
Better quality, ~2.3GB |
| 8GB | llama3.2:3b |
Good general purpose |
| 8GB | qwen2.5:3b |
Good for multilingual |
Current Default Configuration:
- LLM Model:
tinyllama:1.1b(~600MB, fast inference) - Embedding Model:
nomic-embed-text(768-dimensional vectors)
# In docker-compose.yml, adjust limits:
services:
ollama:
deploy:
resources:
limits:
memory: 3G # Reduce if using tinyllamaDirect conversation with the LLM without documents.
- Upload PDF, TXT, or Markdown files
- Documents are chunked (500 chars) and embedded (768-dim vectors)
- Retrieval-augmented generation for accurate answers
- Real-time RAG pipeline visualization
- Step-by-step display: Query Embedding β Similarity Search β Context Retrieval β LLM Generation
- Timing information for each step
- View retrieved source chunks with page numbers
- Live chunk and document counts in sidebar
- Document breakdown showing chunks per file
- ChromaDB stores embeddings persistently
- Chat history maintained in session
# Start the stack
docker compose up -d
# Check logs
docker compose logs -f
# Check Ollama models
docker exec ollama ollama list
# Pull a different model
docker exec ollama ollama pull llama3.2:3b
# Stop everything
docker compose down
# Stop and remove volumes (fresh start)
docker compose down -v
# Check memory usage
docker statsThree utility scripts are provided for inspecting ChromaDB and testing RAG queries:
Shows document and chunk counts in ChromaDB:
docker exec genai-app python /app/chroma_stats.pyOutput:
============================================================
CHROMADB STATISTICS
============================================================
Collection: documents
----------------------------------------
Total chunks: 6,693
Unique documents: 3
Documents breakdown:
- report.pdf: 2,231 chunks
- manual.pdf: 2,231 chunks
- guide.pdf: 2,231 chunks
============================================================
When to use: After uploading documents to verify they were processed correctly.
Run similarity searches against your documents:
# Single query
docker exec genai-app python /app/rag_query.py "What is the main topic?"
# Interactive mode
docker exec -it genai-app python /app/rag_query.pyOutput:
Connected to ChromaDB | Collection: documents | Chunks: 6,693
============================================================
QUERY: What is the main topic?
============================================================
Found 3 results:
[1] Similarity: 0.510 | Source: report.pdf | Page: 12
------------------------------------------------------------
The main topic of this document covers...
When to use:
- Testing if documents are being retrieved correctly
- Debugging why certain queries aren't finding relevant content
- Comparing similarity scores for different query phrasings
Comprehensive ChromaDB inspection with sample queries:
docker exec genai-app python /app/test_chroma.pyWhen to use: Initial setup verification or troubleshooting RAG issues.
Ollama API is exposed on port 11434:
# Chat with the model directly
curl http://localhost:11434/api/generate -d '{
"model": "tinyllama:1.1b",
"prompt": "Explain Docker in 3 sentences",
"stream": false
}'
# List available models
curl http://localhost:11434/api/tagsfrom langchain_ollama import OllamaLLM
llm = OllamaLLM(
model="tinyllama:1.1b",
base_url="http://localhost:11434"
)
response = llm.invoke("What is Kubernetes?")
print(response)If you want to use OpenAI/Anthropic instead of local models:
- Comment out
ollamaandmodel-pullerservices - Update
app/main.pyto useChatOpenAIorChatAnthropic - Add your API key to
.env
# Check if model is downloaded
docker exec ollama ollama list
# Manually pull model
docker exec ollama ollama pull phi3:mini# Check what's using memory
docker stats
# Use a smaller model
docker exec ollama ollama pull tinyllama:1.1b
# Update LLM_MODEL in docker-compose.yml# Check Ollama health
curl http://localhost:11434/api/tags
# Restart Ollama
docker compose restart ollamaThis stack is perfect for:
- Local AI-assisted documentation - Query your runbooks
- Incident analysis - RAG over incident reports
- Code review assistant - Analyze code files
- Learning/demos - Teach GenAI concepts without cloud costs
When you have more RAM available:
# For 16GB RAM, use better models:
LLM_MODEL=llama3.2:8b
EMBEDDING_MODEL=nomic-embed-text
# For 32GB+ RAM, match Docker GenAI stack:
LLM_MODEL=llama2:13bInspired by:
- Docker GenAI Stack
- pi-genai-stack (Raspberry Pi version)
- Ollama
- ChromaDB
MIT License - Use freely!