-
Notifications
You must be signed in to change notification settings - Fork 34
Feature/docker compose optimizations #206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…a.cpp - Add YAML anchors for common configurations (x-common-config, x-huggingface-cache, x-auth-config, etc.) - Reduce code duplication by ~200+ lines across services - Make llama.cpp image configurable via LLAMACPP_IMAGE environment variable - Resolve ARM64/AMD64 platform compatibility issues - Improve maintainability through centralized configuration patterns
- Add --timeout 300 and --retries 3 flags to pip install - Resolve intermittent build failures when downloading large packages (onnxruntime) - Improve build reliability for CI/CD and slower network connections
- Document configurable llama.cpp Docker image option - Provide examples for different architectures (ARM64, AMD64, CUDA) - Keep .env.example in sync with docker-compose.yml capabilities
🤖 Augment PR SummarySummary: Improves Docker-based dev/deploy ergonomics by making the llama.cpp decoder image configurable and increasing build robustness. 🤖 Was this summary useful? React with 👍 or 👎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Llama.cpp decoder service configuration | ||
| # Default: ghcr.io/ggml-org/llama.cpp:server (multi-arch) | ||
| # ARM64 specific: ghcr.io/ggml-org/llama.cpp:server-cuda (if needed) | ||
| # Alternative: local builds or custom images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comment.
- Fix misleading comment about server-cuda being ARM64-specific - CUDA images are for NVIDIA GPU support, not ARM64 architecture - Clarify that server-cuda is for NVIDIA GPUs (typically x86_64)
…rameter - Add missing on_disk_payload parameter to FakeClient mock in test_ingest_schema_mode.py - Resolves TypeError: FakeClient.create_collection() got an unexpected keyword argument 'on_disk_payload' - Ensures test mocks match the real Qdrant client interface which includes this parameter
🐳 Docker Compose Optimization: YAML Anchors & Configurable Services
Summary
Significantly improves Docker Compose maintainability by introducing YAML anchors and making llama.cpp image configurable.
Changes
x-common-config,x-huggingface-cache, etc.) reducing ~200+ lines of duplicationLLAMACPP_IMAGEenvironment variable for ARM64/AMD64 compatibilityBenefits
Testing
Migration
No breaking changes. Optionally set
LLAMACPP_IMAGEin your.envfor custom images.