gguf
Here are 729 public repositories matching this topic...
Distribute and run LLMs with a single file.
-
Updated
Jun 2, 2026 - C++
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.
-
Updated
Jun 1, 2026 - Rust
Maid is a free and open source application for interfacing with llama.cpp models locally, and with Anthropic, DeepSeek, Ollama, Mistral and OpenAI models remotely.
-
Updated
Apr 7, 2026 - TypeScript
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
-
Updated
May 26, 2026 - Python
动手学Ollama,CPU玩转大模型部署,在线阅读地址:https://datawhalechina.github.io/handy-ollama/
-
Updated
Jan 15, 2026 - Jupyter Notebook
The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images - Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phone. Supports text-to-text, vision, text-to-image
-
Updated
Jun 2, 2026 - TypeScript
LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG
-
Updated
Mar 8, 2026 - Python
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
-
Updated
May 27, 2026 - TypeScript
Interface for OuteTTS models.
-
Updated
Mar 23, 2026 - Python
A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
-
Updated
Jun 2, 2026 - Python
An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.
-
Updated
May 30, 2026 - Go
A CLI to estimate inference memory requirements for Hugging Face models, written in Python.
-
Updated
May 18, 2026 - Python
Llama 3+ inference in pure Java
-
Updated
Apr 24, 2026 - Java
Go library for embedded vector search and semantic embeddings using llama.cpp
-
Updated
Mar 6, 2026 - Go
Private on-device AI suite for Android. Fork of Google AI Edge Gallery with llama.cpp, whisper.cpp, stable-diffusion.cpp, GGUF import, voice chat, vision AI, on-device image generation, biometric lock, encrypted history, and CPU/NPU/GPU acceleration.
-
Updated
Jun 1, 2026 - Kotlin
Improve this page
Add a description, image, and links to the gguf topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gguf topic, visit your repo's landing page and select "manage topics."