Pre-download ML models for multi-architecture Docker builds #285
Pre-download ML models for multi-architecture Docker builds #285aponcedeleonch merged 5 commits intomainfrom
Conversation
- Add `scripts/download_models.py` to download architecture-independent ONNX models (FastEmbed, tiktoken, LLMLingua) before Docker build - Remove model-downloader stage from Dockerfile, copy pre-downloaded models from build context instead - Add `download-models` Taskfile target and make `build`/`build-local` depend on it - Create reusable `download-models.yml` GitHub workflow that uploads models as artifacts - Update `image-build.yml`, `release.yml`, `integration-tests.yml`, `offline-tests.yml` to download models artifact before building images - Add `optimum[onnxruntime]` to dev dependencies for LLMLingua ONNX export - Add `models/` to .gitignore Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add llmlingua_model_path field to MCPOptimizerConfig - Add _get_default_model_paths() to calculate paths relative to package root - Add _setup_model_paths() to set defaults only if models directory exists - Update load_config() to call _setup_model_paths() - Add LLMLINGUA_MODEL_PATH to env_mappings Config now automatically uses models/ directory when models are pre-downloaded. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PR ReviewThis PR successfully refactors the Docker build to pre-download ML models before building, enabling multi-architecture support. The implementation is solid with good structure and documentation. ✅ Strengths
🔍 Issues Found1. Inconsistent dependency group usage (scripts/download_models.py:88)
2. Redundant logging in config.py (_setup_model_paths)
3. Error handling could be more specific (download_models.py:115-149)
4. Minor: Variable naming inconsistency
💡 SuggestionsPerformance: The script runs sequentially. Since downloads are independent, consider parallel execution using Code clarity (download_models.py:59-60): files = list(cache_path.iterdir())
return len(files) > 0Can be simplified to: Security & Performance
Minor Style Notes
Overall excellent work! The main issue is the incorrect error message. Other suggestions are optional improvements. |
- Remove misleading FileNotFoundError catch, add _check_optimum_available() to verify module availability before running subprocess - Rename export_llmlingua_model() to download_and_export_llmlingua_model() for clearer terminology - Use subprocess check=True and catch CalledProcessError for cleaner error handling - Remove redundant ImportError catches (fastembed/tiktoken imported at module level) - Remove verbose debug logs from _setup_model_paths() in config.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The download-models workflow was being called multiple times (from image-build, integration-tests, offline-tests, mcp-tef-integration-tests), causing artifact name conflicts (409 Conflict error). Changes: - Move download-models call to code-checks.yml (runs once at the start) - Remove download-models job from individual test workflows - All workflows now download the artifact named 'ml-models' directly - Add GitHub cache for models to speed up subsequent runs - release.yml now depends on code_checks for the artifact Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
scripts/download_models.pyto download architecture-independent ONNX models(FastEmbed, tiktoken, LLMLingua) before Docker build
build context instead
download-modelsTaskfile target and makebuild/build-localdepend on itdownload-models.ymlGitHub workflow that uploads models as artifactsimage-build.yml,release.yml,integration-tests.yml,offline-tests.ymlto download models artifact before building images
optimum[onnxruntime]to dev dependencies for LLMLingua ONNX exportmodels/to .gitignore