A multi-interface platform for managing and serving GGUF (GPT-Generated Unified Format) model files, providing CLI, desktop GUI, and web-based access.
gglib provides a simple interface to catalog, organize, and serve GGUF models locally. It maintains a SQLite database of your models with their metadata, making it easy to find and work with specific models.
- Add models: Import GGUF files and extract metadata automatically
- List models: View all models with their properties in a clean table format
- Update models: Edit model metadata including name, parameters, and custom fields
- Remove models: Clean removal of models from your database
- Serve models: Start llama-server with automatic context size detection
- Chat via CLI: Launch llama-cli directly for quick terminal chat sessions
- OpenAI-compatible Proxy: Automatic model swapping with OpenAI API compatibility
- HuggingFace Hub Integration: Download models directly from HuggingFace Hub
- Fast-path Downloads: Managed Python helper (hf_xet) via Miniconda for multi-gigabyte transfers
- Search & Browse: Discover GGUF models on HuggingFace with search and browse commands
- Quantization Support: Intelligent detection and handling of various quantization formats
- Rich metadata: Support for complex metadata operations and Unicode content
- Reasoning Model Support: Auto-detection and streaming of thinking/reasoning phases with collapsible UI for models like DeepSeek R1, Qwen3, and QwQ
GGLib is organized as a Cargo workspace with compile-time enforced boundaries. The architecture follows a layered design where adapters depend on infrastructure, which depends on core—never the reverse.
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ Core Layer │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ gglib-core │ │
│ │ Pure domain types, ports & traits (no infra deps) │ │
│ └─────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────┘
│
┌─────────────┬─────────────┬─────┴─────┬─────────────┬─────────────┐
▼ ▼ ▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ gglib-db │ │ gglib-gguf │ │ gglib-mcp │ │ gglib-proxy│ │
│ │ SQLite │ │ GGUF file │ │ MCP │ │ OpenAI- │ │
│ │ repos │ │ parser │ │ servers │ │ compat │ │
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
│ │
│ ╔═══════════════════════════════════════════════════════════════════════════════╗ │
│ ║ External Gateways ║ │
│ ║ ║ │
│ ║ ┌────────────────────────────────────┐ ┌────────────────────────────────┐ ║ │
│ ║ │ gglib-runtime │ │ gglib-download │ ║ │
│ ║ │ Process lifecycle manager │ │ Download orchestrator │ ║ │
│ ║ │ ONLY component that spawns │ │ ONLY component that contacts │ ║ │
│ ║ │ & manages llama-server │ │ HuggingFace Hub │ ║ │
│ ║ │ │ │ (via gglib-hf + optional │ ║ │
│ ║ │ │ │ hf_xet subprocess) │ ║ │
│ ║ └────────────────────────────────────┘ └────────────────────────────────┘ ║ │
│ ╚═══════════════════════════════════════════════════════════════════════════════╝ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ Facade Layer │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ gglib-gui │ │
│ │ Shared GUI backend (ensures feature parity across adapters) │ │
│ └─────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ Adapter Layer │
│ │
│ ┌─────────────────────┐ ┌──────────────────────┐ ┌──────────────────────────┐ │
│ │ gglib-cli │ │ gglib-axum │ │ gglib-tauri │ │
│ │ CLI interface │ │ HTTP server │ │ Desktop application │ │
│ │ (terminal UI) │ │ ┌────────────────┐ │ │ ┌────────────────────┐ │ │
│ │ │ │ │ Serves React │ │ │ │ Embeds React UI │ │ │
│ │ │ │ │ UI (static) │ │ │ │ (WebView assets) │ │ │
│ │ │ │ └────────────────┘ │ │ ├────────────────────┤ │ │
│ │ │ │ │ │ │ Embedded Axum │ │ │
│ │ │ │ │ │ │ (HTTP endpoints) │ │ │
│ │ │ │ │ │ └────────────────────┘ │ │
│ └─────────┬───────────┘ └──────────┬───────────┘ └───────────┬──────────────┘ │
│ │ │ │ │
│ └─────────────────────────┼──────────────────────────┘ │
│ │ │
│ All adapters call infrastructure layer via: │
│ • External Gateways (runtime, download) │
│ • Other infrastructure services (db, gguf, mcp, proxy) │
│ │ │
└───────────────────────────────────────┼─────────────────────────────────────────────┘
│
▼
╔═══════════════════════╗
║ External Gateways ║
║ (from infra layer) ║
╚═══════════════════════╝
│
┌───────────────────┴────────────────────┐
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ gglib-runtime │ │ gglib-download │
│ spawns/manages │ │ calls HF Hub API │
└──────────┬───────────┘ └──────────┬───────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ External Systems │
│ │
│ ┌──────────────────────────────┐ │
│ │ llama-server instances │ │
│ │ (child processes) │ │
│ └──────────────────────────────┘ │
│ │
│ ┌──────────────────────────────┐ │
│ │ HuggingFace Hub API │ │
│ │ (HTTPS endpoints) │ │
│ └──────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────┘
Architecture Principles:
- Unified access: All adapters call through infrastructure layer—never directly to external systems
- External gateways: Only
gglib-runtimeandgglib-downloadtouch external systems - Tauri architecture: Embeds both React UI assets AND Axum HTTP server internally
- React UI as artifact: Static files in Axum, bundled assets in Tauri, unused in CLI
- Python hf_xet: Internal subprocess within
gglib-download, not an architectural boundary
Crate Metrics
| Crate | Tests | Coverage | LOC | Complexity |
|---|---|---|---|---|
| gglib-core |
| Crate | Tests | Coverage | LOC | Complexity |
|---|---|---|---|---|
| gglib-db | ||||
| gglib-gguf | ||||
| gglib-hf | ||||
| gglib-download | ||||
| gglib-mcp | ||||
| gglib-proxy | ||||
| gglib-runtime |
| Crate | Tests | Coverage | LOC | Complexity |
|---|---|---|---|---|
| gglib-gui |
| Crate | Tests | Coverage | LOC | Complexity |
|---|---|---|---|---|
| gglib-cli | ||||
| gglib-axum | ||||
| gglib-tauri |
models– GGUF model metadata, GUI/API DTOs, and data structuresservices– TypeScript client layer for GUI frontendscommands– CLI command handlers and web API endpointsutils– Lower-level helpers for parsing and utilities
components– React UI componentscontexts– React Context providershooks– Custom React hookspages– Top-level page componentstypes– Shared TypeScript type definitions
Each crate has its own README with architecture diagrams, module breakdowns, and usage examples:
| Layer | Crate | Description |
|---|---|---|
| Core | gglib-core | Pure domain types, ports & traits |
| Infra | gglib-db | SQLite repository implementations |
| Infra | gglib-gguf | GGUF file format parser |
| Infra | gglib-hf | HuggingFace Hub client |
| Infra | gglib-download | Download queue & manager |
| Infra | gglib-mcp | MCP server management |
| Infra | gglib-proxy | OpenAI-compatible proxy server |
| Infra | gglib-runtime | Process manager & system probes |
| Facade | gglib-gui | Shared GUI backend (feature parity) |
| Adapter | gglib-cli | CLI interface |
| Adapter | gglib-axum | HTTP API server |
| Adapter | gglib-tauri | Desktop GUI (Tauri + React) |
GGLib provides a streamlined installation process using the included Makefile for the best developer and user experience.
Download the latest release for your platform from the Releases page.
- Download the macOS release tarball for your architecture:
gglib-gui-*-aarch64-apple-darwin.tar.gzfor Apple Silicon (M1/M2/M3)gglib-gui-*-x86_64-apple-darwin.tar.gzfor Intel Macs
- Extract the archive:
tar -xzf gglib-gui-*.tar.gz - Double-click
macos-install.command(or run./macos-install.commandin Terminal) - The script will remove the quarantine attribute and optionally install to
/Applications
Note: macOS marks downloaded apps as "damaged" because they are not code-signed. The install script fixes this automatically by removing the quarantine attribute.
Download and extract the Windows release (gglib-gui-*-x86_64-pc-windows-msvc.zip), then run gglib-gui.exe.
Download and extract the Linux release (gglib-gui-*-x86_64-unknown-linux-gnu.tar.gz), then run the gglib-gui binary.
The recommended way to install GGLib is using the Makefile:
# Clone the repository
git clone https://github.com/mmogr/gglib.git
cd gglib
# Full setup: check dependencies, build, and install
make setupThe make setup command will:
- Check for required system dependencies (Rust, Node.js, build tools)
- Provision the managed Miniconda environment used by the hf_xet fast download helper
- Build the web UI frontend
- Build and install the CLI tool to
~/.cargo/bin/ - Optionally install llama.cpp with automatic GPU detection
make setup (and gglib check-deps) exits with an error if Python/Miniconda is missing or the fast download helper cannot be prepared. Run those commands first on new machines so large downloads succeed without manual intervention.
Note: When installed via make setup, GGLib operates in Developer Mode. It will keep its database (gglib.db), configuration (.env), and llama.cpp binaries inside your repository folder. This keeps your development environment self-contained. (Downloaded models are still stored in ~/.local/share/llama_models by default).
The Makefile provides several convenient targets:
Installation & Setup:
make setup- Full setup (dependencies + build + install + llama.cpp)make install- Build and install CLI to~/.cargo/bin/make uninstall- Full Cleanup: Removes binary, system data, database, and cleans the repo. (Preserves downloaded models).
Building:
make build- Build release binarymake build-dev- Build debug binarymake build-gui- Build web UI frontendmake build-tauri- Build desktop GUI applicationmake build-all- Build everything (CLI + web UI)
Development:
make test- Run all testsmake check- Check code without buildingmake fmt- Format codemake lint- Run clippy lintermake doc- Generate and open documentation
llama.cpp Management:
make llama-install-auto- Install llama.cpp with auto GPU detectionmake llama-status- Show llama.cpp installation statusmake llama-update- Update llama.cpp to latest version
Running:
make run-gui- Launch desktop GUImake run-web- Start web servermake run-serve- Run model servermake run-proxy- Run OpenAI-compatible proxy
Cleaning:
make clean- Remove build artifactsmake clean-gui- Remove web UI buildmake clean-llama- Remove llama.cpp installationmake clean-db- Remove database files
If you prefer to use Cargo directly:
# Install from source
cargo install --path .
# Or install from crates.io (when published)
cargo install gglib- Rust 1.70 or later - Install Rust
- Python 3 via Miniconda (required for the hf_xet fast download helper) - Install Miniconda
- Node.js 18+ (for web UI) - Install Node.js
- SQLite 3.x
- Build tools (platform-specific):
- macOS:
xcode-select --installandbrew install cmake - Ubuntu/Debian:
sudo apt install build-essential cmake git - Fedora/RHEL:
sudo dnf install gcc-c++ cmake git - Arch Linux:
sudo pacman -S base-devel cmake git - Windows: Visual Studio 2022 with C++ tools, CMake, and Git
- macOS:
Note: llama.cpp is managed by GGLib itself. You don't need to install it separately!
Downloaded GGUF files now live in a user-configurable directory (default: ~/.local/share/llama_models). You can change it at any time using whichever interface is most convenient:
- During
make setup– the installer now prompts for the location and accepts Enter to keep the default. - Environment file – copy
.env.exampleto.envand setGGLIB_MODELS_DIR=/absolute/path, or edit the value viagglib config models-dir set(see below). All helpers expand~/and will create the directory when needed. - CLI overrides – use
gglib --models-dir /tmp/models download …for a one-off run, or persist the change withgglib config models-dir prompt|set <path>. - GUI/Web settings – click the gear icon in the header to open the Settings modal, review the current directory, and update it without touching the CLI.
The precedence order is: CLI --models-dir flag → GGLIB_MODELS_DIR from the environment/.env → default path. All download code paths rely on the shared helper in src/utils/paths.rs, so whichever option you choose applies consistently across CLI, desktop, web, and background tasks.
Changing the directory only affects future downloads and servers—it does not move any GGUF files you already downloaded. If you want your existing models in the new location, move them manually and then rescan/add them as needed.
Large GGUFs can saturate a single HTTP stream, so gglib bundles a managed Python helper that talks to Hugging Face's hf_xet service. Fast downloads are now the only path—if the helper is missing or broken, commands like gglib download will fail until you repair the environment.
- On the first run (or after
gglib check-deps/make setup), gglib provisions a Miniconda environment under<data_root>/.conda/gglib-hf-xetand installshuggingface_hub>=1.1.5plushf_xet>=0.6. A tiny helper script lives in<data_root>/.gglib-runtime/python/hf_xet_downloader.py. - The helper emits newline-delimited JSON so both the CLI and GUI can keep their existing progress indicators.
- Missing Python packages are treated as errors. Run
gglib check-depsormake setupto reinstall the managed environment; there is no legacy Rust HTTP fallback anymore.
Requirements: install Miniconda (or another Python 3 distribution with venv support) and ensure enough disk space to populate the per-user .conda/gglib-hf-xet directory. The helper respects the same Hugging Face tokens you pass to gglib download and does not change how downloads are recorded in the SQLite database.
GGLib provides three complementary interfaces for interacting with GGUF models. All interfaces share the same backend implementation (database, services, process manager, and proxy), ensuring consistent behavior and data across all modes.
Command-line interface for GGUF model management and service control.
Capabilities:
- Model operations:
gglib add,gglib list,gglib remove,gglib update - HuggingFace Hub integration:
gglib download,gglib search,gglib browse - Direct terminal chat:
gglib chat <id|name> - Server management:
gglib serve,gglib proxy - Interface launchers:
gglib gui,gglib web - llama.cpp management:
gglib llama install,gglib llama update
Cross-platform desktop application built with Tauri (Rust backend) and React frontend.
Technical details:
- Launched via
gglib guicommand - Uses shared
GuiBackendservice for all operations - Spawns embedded HTTP API server on localhost for frontend-backend communication
- React frontend communicates via standard HTTP endpoints (
/api/models,/api/servers, etc.) - Same API routes as standalone web server
- Shares business logic, data model, and process management with other interfaces
Browser-based interface backed by Axum HTTP server.
Technical details:
- Started via
gglib webcommand - Default binding:
0.0.0.0:9887(LAN accessible) - API routes:
/api/models,/api/servers,/api/chat,/api/proxy/... - React frontend (in
src/) uses same HTTP endpoints as Tauri embedded server - Services layer (
TauriService,ChatService) detects environment and uses either Tauri IPC (invoke) or HTTP calls
The web server provides browser-based access to GGLib's functionality via an Axum HTTP API.
Configuration:
# Start web server (binds to 0.0.0.0:9887 by default)
gglib web --port 9887 --base-port 9000Access:
- Local:
http://localhost:9887/ - Network:
http://<HOST_IP>:9887/ - API:
http://<HOST_IP>:9887/api
Parameters:
--port: Web server port (default: 9887)--base-port: Starting port for llama-server instances (default: 9000)
The proxy provides OpenAI API-compatible endpoints for model inference. This enables GGLib to work seamlessly with OpenWebUI and other tools that support the OpenAI API format.
Configuration:
# Start proxy (binds to 127.0.0.1:8080 by default)
gglib proxy --host 0.0.0.0 --port 8080 --llama-port 5500Endpoints:
- Base URL:
http://<HOST_IP>:8080/v1/ /v1/models- List available models/v1/chat/completions- Chat completions/health- Health check
Parameters:
--host: Bind address (default: 127.0.0.1, use 0.0.0.0 for network access)--port: Proxy port (default: 8080)--llama-port: Starting port for llama-server instances (default: 5500)--default-context: Default context size (default: 4096)
Features:
- Automatic model server management (start/stop on demand)
- Request routing to appropriate llama-server instances
- Full OpenAI SDK compatibility
- Seamless integration with OpenWebUI
OpenWebUI Integration:
To use GGLib with OpenWebUI:
- Start the proxy with network access:
gglib proxy --host 0.0.0.0 --port 8080 - In OpenWebUI settings, configure:
- API Base URL:
http://localhost:8080/v1 - API Key: (any value, not validated)
- API Base URL:
- Select models from the dropdown - GGLib will automatically start the appropriate llama-server
For details on developing the desktop GUI, see src-tauri/README.md.
The desktop build embeds the same REST API that powers gglib gui web. It binds to http://localhost:8888 by default; make sure that port is free before running gglib gui. You can pick a different port by setting GGLIB_GUI_API_PORT=<port> before launching. The desktop UI now detects the port at runtime, so no frontend rebuild is required.
export OPENAI_BASE_URL="http://localhost:8080/v1"
export OPENAI_API_KEY="dummy"Then use any OpenAI-compatible client library as you normally would.
Network Binding:
- Web server binds to
0.0.0.0by default (network accessible) - Proxy binds to
127.0.0.1by default (local only) - Use
--host 0.0.0.0for network access to proxy
Authentication:
- No authentication required by default
- Designed for trusted network environments
Recommendations:
- Use firewall rules to restrict access to trusted IP ranges
- Only expose on private networks (192.168.x.x, 10.x.x.x, 172.16-31.x.x)
- Use VPN for access from outside local network
- Do not port-forward to public internet without additional authentication
📚 View Full API Documentation →
The complete API documentation is automatically generated from the source code and hosted on GitHub Pages. It includes:
- 🔍 Detailed API reference for all public functions and types
- 💡 Usage examples and code snippets
- 🏗️ Architecture overview of the codebase
- 🔧 Developer guides for contributing
The documentation is automatically updated with every release, so you'll always have access to the latest information.