chatllm.zig

Zig wrapper for chatllm.cpp — run ChatGLM, Qwen, Llama, DeepSeek, and 70+ other LLMs locally.

Features

🚀 Pure Zig build system — no CMake, no Make, just zig build
📦 CLI + Library — use as a standalone tool or embed in your Zig project
🌐 API server — OpenAI, Ollama, and llama.cpp compatible endpoints
💬 Interactive chat — REPL with session management
📥 Model registry — download pre-quantized models with chatllm pull
⚡ Streaming — real-time token streaming with callbacks
🧮 Embeddings — generate text embeddings for RAG applications
🔧 GPU backends — CUDA, Vulkan, Metal support (via build options)
📱 Cross-platform — Windows, macOS, Linux, FreeBSD

Supported Models

chatllm.zig supports all models from chatllm.cpp, including:

Family	Models
Llama	Llama 3.x, Llama 2, Code Llama
Qwen	Qwen 3, Qwen 2.5, Qwen VL, Qwen Audio
ChatGLM	ChatGLM 4, GLM-4, CharacterGLM
DeepSeek	DeepSeek V3, DeepSeek Coder
Google	Gemma 3, Gemma 2
Mistral	Mistral 7B, Mixtral
Microsoft	Phi-4, Phi-3
Others	InternLM, Yi, Falcon, Baichuan, ERNIE, and 60+ more

See the chatllm.cpp model list for the complete list.

Installation

CLI Tool

# Clone with submodule
git clone --recursive https://github.com/bkataru/chatllm.zig.git
cd chatllm.zig

# Build
zig build -Doptimize=ReleaseFast

# The CLI is at zig-out/bin/chatllm
./zig-out/bin/chatllm --help

As a Zig Library

Add to your build.zig.zon:

.dependencies = .{
    .chatllm = .{
        .url = "https://github.com/bkataru/chatllm.zig/archive/refs/heads/main.tar.gz",
        // Add hash after first build attempt
    },
},

In your build.zig:

const chatllm_dep = b.dependency("chatllm", .{
    .target = target,
    .optimize = optimize,
});

// Add the module
exe.root_module.addImport("chatllm", chatllm_dep.module("chatllm"));

// Link the library
exe.linkLibrary(chatllm_dep.artifact("chatllm"));

Quick Start

Download a Model

# List available models
chatllm pull --list

# Download Qwen3 1.7B (default)
chatllm pull qwen3

# Download specific variant and quantization
chatllm pull llama3.2:3b
chatllm pull llama3.1:8b:q4_1

Interactive Chat

chatllm chat -m ~/.chatllm/models/qwen3-1.7b.bin

# With system prompt
chatllm chat -m model.bin -s "You are a helpful coding assistant"

Single Prompt

chatllm run -m model.bin "Explain quantum computing in simple terms"

Start API Server

chatllm serve -m model.bin -p 8080

# Use with curl
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "local", "messages": [{"role": "user", "content": "Hello!"}]}'

Generate Embeddings

chatllm embedding -m embedding-model.bin "Text to embed"

CLI Commands

Command	Description
`chat`	Interactive chat session (default)
`run`	Run a single prompt
`serve`	Start HTTP API server
`embedding`	Generate text embeddings
`pull`	Download pre-quantized models
`list`	List downloaded models
`show`	Show model information
`rm`	Remove a downloaded model

Run chatllm <command> --help for detailed options.

API Server Endpoints

OpenAI API (compatible)

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat completions (streaming supported)
`/v1/embeddings`	POST	Generate embeddings
`/v1/models`	GET	List available models

Ollama API (compatible)

Endpoint	Method	Description
`/api/chat`	POST	Chat completions
`/api/tags`	GET	List models
`/api/version`	GET	Version info
`/api/ps`	GET	Running models

llama.cpp API

Endpoint	Method	Description
`/health`	GET	Health check
`/props`	GET	Server properties
`/slots`	GET	Slot information

Model Registry

The pull command downloads pre-quantized models from ModelScope:

# Model specification format
chatllm pull <model_name>[:<variant>][:<quantization>]

# Examples
chatllm pull qwen3           # qwen3:1.7b:q8 (default)
chatllm pull qwen3:4b        # qwen3:4b:q8
chatllm pull llama3.1:8b:q4_1

# Switch to HuggingFace
chatllm pull --registry huggingface qwen3

Models are stored in ~/.chatllm/models/.

Configuration

Models are stored in the user's home directory:

Platform	Path
Windows	`%USERPROFILE%\.chatllm\models\`
macOS/Linux	`~/.chatllm/models/`

Building

Basic Build

zig build                        # Debug build
zig build -Doptimize=ReleaseFast # Release build

Build Options

Option	Description
`-Doptimize=ReleaseFast`	Optimized release build
`-Dcuda=true`	Enable CUDA backend (NVIDIA)
`-Dvulkan=true`	Enable Vulkan backend
`-Dmetal=true`	Enable Metal backend (macOS)
`-Dchatllm_path=<path>`	Custom chatllm.cpp path
`-Dtarget=<triple>`	Cross-compile target

Examples

# macOS with Metal
zig build -Doptimize=ReleaseFast -Dmetal=true

# NVIDIA GPU support
zig build -Doptimize=ReleaseFast -Dcuda=true

# Cross-compile for Linux
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-linux-gnu

Running Tests

zig build test

Project Structure

chatllm.zig/
├── src/
│   ├── main.zig              # CLI entry point
│   └── commands/
│       ├── chat.zig          # Interactive chat
│       ├── run.zig           # Single prompt
│       ├── serve.zig         # API server
│       ├── embedding.zig     # Embeddings
│       ├── pull.zig          # Model download
│       └── list.zig          # List models
├── chatllm.cpp.zig/
│   └── chatllm.zig           # Zig bindings for chatllm.cpp
├── chatllm.cpp/              # chatllm.cpp submodule
├── build.zig                 # Main build file
├── build.zig.zon             # Package manifest
├── build_chatllm.zig         # chatllm.cpp build integration
├── docs/                     # Documentation
└── examples/                 # Example code

Tested Platforms

Platform	Architecture	Status
Windows 11	x86_64	✅
macOS 14+	aarch64 (Apple Silicon)	✅
macOS 14+	x86_64 (Intel)	✅
Ubuntu 22.04	x86_64	✅
Ubuntu 22.04	aarch64	✅

Backend Support

Backend	Status	Platforms
CPU	✅ Stable	All
Metal	✅ Stable	macOS
CUDA	🔧 Experimental	Linux, Windows
Vulkan	🔧 Experimental	All
OpenCL	📋 Planned	-

Library Usage

const std = @import("std");
const chatllm = @import("chatllm");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Initialize
    _ = chatllm.ChatLLM.globalInit();

    var llm = try chatllm.ChatLLM.init(allocator);
    defer llm.deinit();

    // Configure
    try llm.appendParam("-m");
    try llm.appendParam("model.bin");

    // Set up callbacks
    const Ctx = struct {
        pub fn onPrint(_: *@This(), ptype: chatllm.PrintType, text: []const u8) void {
            if (ptype == .chat_chunk) {
                std.debug.print("{s}", .{text});
            }
        }
        pub fn onEnd(_: *@This()) void {
            std.debug.print("\n", .{});
        }
    };

    var ctx = Ctx{};
    var callback = chatllm.CallbackContext(*Ctx){
        .user_data = &ctx,
        .print_fn = Ctx.onPrint,
        .end_fn = Ctx.onEnd,
    };

    try llm.startWithContext(*Ctx, &callback);
    try llm.userInput("Hello, world!");
}

See the examples/ directory for more usage patterns.

Roadmap

Credits

chatllm.cpp — The underlying C++ inference engine
GGML — Tensor library for ML
Zig — The programming language

License

MIT License — see LICENSE for details.

chatllm.cpp is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chatllm.zig

Features

Supported Models

Installation

CLI Tool

As a Zig Library

Quick Start

Download a Model

Interactive Chat

Single Prompt

Start API Server

Generate Embeddings

CLI Commands

API Server Endpoints

OpenAI API (compatible)

Ollama API (compatible)

llama.cpp API

Model Registry

Configuration

Building

Basic Build

Build Options

Examples

Running Tests

Project Structure

Tested Platforms

Backend Support

Library Usage

Roadmap

Credits

License

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
chatllm.cpp @ d414c63		chatllm.cpp @ d414c63
chatllm.cpp.zig		chatllm.cpp.zig
docs		docs
examples		examples
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon
build_chatllm.zig		build_chatllm.zig

License

bkataru/chatllm.zig

Folders and files

Latest commit

History

Repository files navigation

chatllm.zig

Features

Supported Models

Installation

CLI Tool

As a Zig Library

Quick Start

Download a Model

Interactive Chat

Single Prompt

Start API Server

Generate Embeddings

CLI Commands

API Server Endpoints

OpenAI API (compatible)

Ollama API (compatible)

llama.cpp API

Model Registry

Configuration

Building

Basic Build

Build Options

Examples

Running Tests

Project Structure

Tested Platforms

Backend Support

Library Usage

Roadmap

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages