A robust, production-ready Python toolkit to automate the synchronization between a directory of .gguf model files and a llama-swap config.yaml file. This project provides utilities for downloading models from Hugging Face, verifying their integrity, and automatically maintaining your llama-swap configuration.
This toolkit is designed to be a safe assistant, not a destructive enforcer. Its core principle is to never overwrite your work.
- For New Models: When a new
.gguffile is detected, the sync script creates a complete, scaffolded entry in yourconfig.yamlto get you started quickly. - For Existing Models: The script's "audit" is non-destructive. It only checks if an entry is missing essential keys from the default template. If so, it adds the missing keys with their default values. Any changes you have made to existing values—including descriptions, TTLs, or even the
cmdstring—are preserved.
You can customize your model entries with confidence, knowing this tool will not revert your changes.
llama-swap-sync/
├── llama_swap_sync.py # Main sync script - manages config.yaml
├── download_model.py # Download models from Hugging Face
├── update_models.py # Check for and apply model updates
├── hf_utils.py # Shared utility functions
├── requirements.txt # Python dependencies
├── compose.yml # Docker Compose configuration
├── .env # Environment variables (example)
├── config.yaml # Generated llama-swap config
├── example_config.yaml # Example configuration
└── README.md # This file
- Recursive Model Detection: Automatically scans the models directory and all subdirectories to discover
.gguffiles at any depth level - Smart Key Generation: Creates safe, shortened YAML keys for long model names with collision detection to prevent formatting issues
- Stale Entry Pruning: Optionally removes entries from the config file if the corresponding
.ggufmodel file is no longer present (--prune) - Non-Destructive Auditing: Preserves your manual edits while ensuring all entries have required fields
- Automated Backups: Creates timestamped backups of your
config.yamlbefore making changes and automatically maintains the most recent backups (configurable viaLLAMA_SWAP_MAX_BACKUPS, default: 3) - Docker Integration: Automatically restarts your Docker container after configuration changes using the Docker CLI
- Production-Grade Safety:
- Dry Run Mode: Use
--dry-runto preview all proposed changes without modifying any files or restarting services - Atomic File Writes: Prevents
config.yamlcorruption by writing changes to a temporary file before atomically replacing the original - Concurrency Lock: A
.lockfile mechanism prevents multiple instances from running simultaneously - Path Validation: Validates all file paths to prevent path traversal attacks
- Dry Run Mode: Use
- Flexible Configuration: Configure via command-line arguments, environment variables, or script defaults
- Structured Logging: Provides clear, timestamped logs with verbosity controls (
--verbose,--quiet) - Cross-Platform Support: Works on Windows, Linux, and macOS with appropriate symbol rendering
- Direct Hugging Face Integration: Download
.ggufmodels directly from Hugging Face repositories - Automatic Directory Structure: Preserves the repository structure in your local models directory
- SHA256 Verification: Automatically verifies downloaded files against Hugging Face LFS hashes
- Hash File Generation: Creates
.sha256files compatible withsha256sumfor later verification - Retry Logic: Automatically retries failed downloads with exponential backoff
- Resume Support: Downloads automatically resume if interrupted
- Exit Code Standards: Uses distinct exit codes for different error types for better automation
- Managed Model Detection: Scans for models with
.sha256files to check for updates - Interactive Selection: User-friendly menu for selecting which models to update
- Batch Processing: Update multiple models or all at once
- Update Statistics: Tracks and reports successful and failed updates
- Cross-Platform: Proper symbol rendering on all operating systems
- URL parsing for Hugging Face links
- SHA256 calculation with optional progress bars
- Hash validation with hexadecimal verification
- Local and remote hash file management
- Recursive
.gguffile discovery - Path validation and security checks
- Download progress and error handling
- Python: 3.11 or newer
- Docker: Optional, only if you need automatic container restart
- Hugging Face Account: Required for downloading models (some models may require acceptance of license terms)
- Disk Space: At least 10GB free (models are typically 2-15GB each)
- GPU: Optional, for running llama-swap, but not required for this toolkit
- Clone or download the repository:
git clone https://github.com/pkeffect/llama-swap-sync
cd llama-swap-sync- Create and activate a Python virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install the required dependencies:
pip install -r requirements.txt- Configure environment variables (optional):
cp .env.example .env
# Edit .env with your preferred settingsUse download_model.py to download models from Hugging Face with automatic verification:
# Download a model
python download_model.py "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"
# Download to a custom directory
python download_model.py \
--dest-dir /mnt/models \
"https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama-3.2-3B-Instruct-f16.gguf"
# Skip hash verification (not recommended)
python download_model.py --skip-verification "https://..."
# Customize retry behavior
python download_model.py --retries 3 --retry-delay 5 "https://..."How it works:
- Parses the Hugging Face URL to extract repository and filename
- Creates the destination directory structure (e.g.,
./models/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/) - Fetches the SHA256 hash from Hugging Face metadata
- Downloads the model file (with automatic resume support)
- Verifies the downloaded file against the hash with progress bar
- Creates a
.sha256file for future verification
Use update_models.py to check for and apply updates to your models:
# Check for updates and interactively select which to apply
python update_models.pyThe script will:
- Scan for all models with
.sha256files - Check each model against Hugging Face for updates
- Present an interactive menu for selection
- Download and verify selected updates
- Report success/failure statistics
The sync script runs from the command line and by default looks for ./models and ./config.yaml.
- Perform a dry run to see what would change:
python llama_swap_sync.py --dry-run- Run the sync, adding new models and updating existing entries:
python llama_swap_sync.py- Sync and remove stale entries from the config:
python llama_swap_sync.py --prune- Run with verbose logging to see detailed operation information:
python llama_swap_sync.py --verbose- Run in quiet mode, showing only warnings and errors:
python llama_swap_sync.py --quiet- Sync without restarting the Docker container:
python llama_swap_sync.py --no-restartSpecify custom paths for the config file and models directory, prune stale entries, and restart a specific container:
python llama_swap_sync.py \
--config /opt/llama-swap/config.yaml \
--models-dir /mnt/models \
--container my-llama-swap-container \
--prune \
--verboseSet these in your .env file or shell environment for convenience:
| Variable | Description | Default |
|---|---|---|
LLAMA_SWAP_CONFIG |
Path to config file | ./config.yaml |
LLAMA_SWAP_MODELS_DIR |
Path to models directory | ./models |
LLAMA_SWAP_CONTAINER |
Docker container name | llama-swap |
LLAMA_SWAP_MAX_BACKUPS |
Number of backups to keep | 3 |
LLAMA_SWAP_MAX_KEY_LENGTH |
Maximum YAML key length before shortening | 80 |
MODELS_PATH |
Models volume for Docker | ./models |
CONFIG_PATH |
Config volume for Docker | ./config.yaml |
LLAMA_SWAP_PORT |
Web interface port | 8080 |
NVIDIA_VISIBLE_DEVICES |
GPU visibility | all |
GPU_COUNT |
Number of GPUs to use | 1 |
Linux/Mac:
export LLAMA_SWAP_CONFIG="/opt/llama-swap/config.yaml"
export LLAMA_SWAP_MODELS_DIR="/mnt/nvme/models"
export LLAMA_SWAP_CONTAINER="my-llama-container"Windows (PowerShell):
$env:LLAMA_SWAP_CONFIG="C:\llama-swap\config.yaml"
$env:LLAMA_SWAP_MODELS_DIR="D:\models"
$env:LLAMA_SWAP_CONTAINER="my-llama-container"| Option | Description |
|---|---|
--config |
Path to the config file (default: ./config.yaml) |
--models-dir |
Path to the models directory (default: ./models) |
--container |
Docker container name to restart (default: llama-swap) |
--prune |
Remove entries from config if their .gguf file is missing |
--no-restart |
Do not restart the Docker container after changes |
--dry-run |
Show what changes would be made without modifying files or services |
-v, --verbose |
Enable verbose, debug-level logging |
-q, --quiet |
Enable quiet logging, showing only warnings and errors |
| Option | Description | Default |
|---|---|---|
url |
Full Hugging Face URL to the model file (required) | - |
--dest-dir |
Root destination directory for models | ./models |
--retries |
Number of times to retry failed downloads | 5 |
--retry-delay |
Initial delay in seconds between retries (doubles each time) | 10 |
--skip-verification |
Skip SHA256 verification after download | False |
The sync script recursively scans the models directory using os.walk(), discovering all .gguf files regardless of their depth in the directory structure. For each model:
- Path Normalization: Converts Windows paths to POSIX-style paths for consistency
- Key Generation: Creates a unique key by replacing
/with--(e.g.,TheBloke/model.gguf→TheBloke--model) - Smart Shortening: For keys longer than 80 characters, the script intelligently shortens them using SHA256 hash suffixes
- Collision Detection: Checks for and prevents any key collisions that would cause data loss
Each model entry in config.yaml includes all required llama-swap fields:
bartowski--Mistral-7B-Instruct-v0.2-GGUF--mistral-7b-instruct-v0.2.Q8_0:
name: bartowski / Mistral 7B Instruct v0.2 GGUF / mistral 7b instruct v0.2.Q8 0
description: Auto-generated entry for bartowski/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q8_0.gguf
cmd: |
/app/llama-server
-m /models/bartowski/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q8_0.gguf
-ngl 99
-c 4096
-b 2048
-ub 512
--temp 0.7
--top-p 0.95
--top-k 40
--repeat-penalty 1.1
--port ${PORT}
--host 0.0.0.0
aliases: []
env: []
ttl: 0
unlisted: false
filters: {}
metadata: {}
macros: {}
concurrencyLimit: 0
cmdStop: ''The script maintains up to 3 timestamped backups by default (configurable via LLAMA_SWAP_MAX_BACKUPS). Before creating a new backup, it removes the oldest backup if the limit is reached.
Backups are named: config.yaml.bak.YYYYMMDD_HHMMSS
Restoring from Backup:
# List available backups
ls -lh config.yaml.bak.*
# Restore a specific backup
cp config.yaml.bak.20250115_143022 config.yamlThe script uses the Docker CLI directly via subprocess to restart containers, bypassing Python library compatibility issues. This approach:
- Works reliably on Windows, Linux, and macOS
- Handles Docker Desktop configurations correctly
- Provides clear error messages if Docker is not running
- Times out after 60 seconds to prevent hanging
- Includes health check in compose.yml to verify container readiness
- All file paths are validated to prevent path traversal attacks
- Relative paths only - absolute paths are rejected
- No
..sequences allowed in paths
- SHA256 verification prevents file corruption and MITM attacks
- Hexadecimal validation ensures hash integrity
- Optional progress bars for large files
- Uses
yaml.safe_load()andyaml.SafeDumperexclusively - Prevents YAML injection attacks
- No code execution through YAML parsing
- Lock file mechanism prevents simultaneous execution
- Atomic file writes prevent corruption
- Proper cleanup in error conditions
- Download a model from Hugging Face:
python download_model.py "https://huggingface.co/bartowski/..."- Sync your configuration:
python llama_swap_sync.py-
Your llama-swap container automatically restarts with the new model available!
-
Periodically check for updates:
python update_models.pyIf you see errors about connecting to Docker:
- Ensure Docker Desktop is running:
docker ps - The script uses the Docker CLI directly, so if
docker psworks, the script should work - Check that your container name matches (default:
llama-swap) - Verify Docker is in your PATH
If llama-swap reports config errors:
- Check for keys with
?and:on separate lines - these indicate formatting issues - Run the sync script again - it automatically detects and fixes these issues
- Use
--pruneto remove old malformed entries
- Verify the
.gguffile exists in the models directory - Check file permissions
- Run with
--verboseto see detailed scanning output - Ensure the file path doesn't contain
..or other invalid characters
- The download may have been interrupted - try downloading again
- The file may be corrupted - delete and re-download
- If persistent, use
--skip-verification(not recommended) and report the issue
On some Windows terminals, you may see garbled characters instead of checkmarks. This is normal and doesn't affect functionality. The script automatically uses ASCII alternatives on Windows when possible.
# Create models directory
mkdir -p models
# Download several models
python download_model.py "https://huggingface.co/..."
python download_model.py "https://huggingface.co/..."
# Generate initial config
python llama_swap_sync.py
# Start llama-swap with docker-compose
docker-compose up -d# Add this to a cron job or scheduled task
python llama_swap_sync.py --quiet --prune
python update_models.py# On old server: backup models and config
tar -czf models-backup.tar.gz models/
cp config.yaml config.yaml.backup
# On new server: restore and sync
tar -xzf models-backup.tar.gz
python llama_swap_sync.py --dry-run # Preview changes
python llama_swap_sync.py # Apply changes# Preview changes without applying
python llama_swap_sync.py --dry-run --verbose
# Check for stale entries
python llama_swap_sync.py --dry-run --pruneUnderstanding exit codes helps with automation and scripting:
0- Success1- Invalid URL format2- Download failed3- Verification failed
0- Success (changes made or no changes needed)1- Critical error (lock file exists, YAML corrupt, etc.)
0- Success1- Error during update process
Core dependencies (see requirements.txt for versions):
- PyYAML (~=6.0.1): YAML parsing and generation
- huggingface-hub (~=6.1.4): Downloading models from Hugging Face
- tqdm (~=4.67.1): Progress bars for long operations
Note: The docker Python library is no longer required - the script uses the Docker CLI directly.
- Removed unused dependencies (docker, blake3)
- Added SHA256 validation for hash files
- Added progress bars for hash calculation
- Improved cross-platform symbol support
- Added key collision detection
- Made MAX_BACKUPS and MAX_KEY_LENGTH configurable
- Improved Docker error messages
- Added exit code constants
- Added health check to docker-compose
- Added resource limits to docker-compose
- Enhanced module-level documentation
- Lock file now exits with error code on conflict
- Updated example_config.yaml to match actual structure
- Initial public release
- Core sync functionality
- Download utility
- Update checker
- Comprehensive documentation
Contributions are welcome! Please ensure:
- Code follows the existing style and conventions
- All functions have docstrings
- Version numbers are updated in file headers
- README is updated for new features
- Security considerations are maintained
This project is licensed under the MIT License.
- Built for use with llama-swap by mostlygeek
- Designed for managing GGUF models from Hugging Face repositories
- Thanks to the Python and open-source community
- Thanks to the llama.cpp team
For issues, questions, or contributions:
- Open an issue on GitHub
- Check existing issues for solutions
- Review this README thoroughly before reporting issues
Note: This toolkit manages configuration files and Docker containers. Always review changes with --dry-run before applying them in production environments.