Skip to content

tattn/LocalLLMClient

Repository files navigation

LocalLLMClient

License: MIT CI

A Swift package to interact with local Large Language Models (LLMs) on Apple platforms.

example on iOS example on macOS
Demo / Multimodal
MobileVLM-3B (llama.cpp) Qwen2.5 VL 3B (MLX)
llamacpp-mobilevlm.mov
mlx-qwen2.5.mov

iPhone 16 Pro

Example app

Important

This project is still experimental. The API is subject to change.

Features

Installation

Add the following dependency to your Package.swift file:

dependencies: [
    .package(url: "https://github.com/tattn/LocalLLMClient.git", branch: "main")
]

Usage

The API documentation is available here.

Basic Usage

Using with llama.cpp
import LocalLLMClient
import LocalLLMClientLlama
import LocalLLMClientUtility

// Download model from Hugging Face (Gemma 3)
let ggufName = "gemma-3-4B-it-QAT-Q4_0.gguf"
let downloader = FileDownloader(source: .huggingFace(
    id: "lmstudio-community/gemma-3-4B-it-qat-GGUF",
    globs: [ggufName]
))

try await downloader.download { print("Progress: \($0)") }

// Initialize a client with the downloaded model
let modelURL = downloader.destination.appending(component: ggufName)
let client = try await LocalLLMClient.llama(url: modelURL, parameter: .init(
    context: 4096,      // Context size
    temperature: 0.7,   // Randomness (0.0〜1.0)
    topK: 40,           // Top-K sampling
    topP: 0.9,          // Top-P (nucleus) sampling
    options: .init(responseFormat: .json) // Response format
))

let prompt = """
Create the beginning of a synopsis for an epic story with a cat as the main character.
Format it in JSON, as shown below.
{
    "title": "<title>",
    "content": "<content>",
}
"""

// Generate text
let input = LLMInput.chat([
    .system("You are a helpful assistant."),
    .user(prompt)
])

for try await text in try await client.textStream(from: input) {
    print(text, terminator: "")
}
Using with Apple MLX
import LocalLLMClient
import LocalLLMClientMLX
import LocalLLMClientUtility

// Download model from Hugging Face
let downloader = FileDownloader(
    source: .huggingFace(id: "mlx-community/Qwen3-1.7B-4bit", globs: .mlx)
)
try await downloader.download { print("Progress: \($0)") }

// Initialize a client with the downloaded model
let client = try await LocalLLMClient.mlx(url: downloader.destination, parameter: .init(
    temperature: 0.7,    // Randomness (0.0 to 1.0)
    topP: 0.9            // Top-P (nucleus) sampling
))

// Generate text
let input = LLMInput.chat([
    .system("You are a helpful assistant."),
    .user("Tell me a story about a cat.")
])

for try await text in try await client.textStream(from: input) {
    print(text, terminator: "")
}
Using with Apple FoundationModels
import LocalLLMClient
import LocalLLMClientFoundationModels

// Available on iOS 26.0+ / macOS 26.0+ and requires Apple Intelligence 
let client = try await LocalLLMClient.foundationModels(
    // Use system's default model
    model: .default,
    // Configure generation options
    parameter: .init(
        temperature: 0.7,
    )
)

// Generate text
let input = LLMInput.chat([
    .system("You are a helpful assistant."),
    .user("Tell me a short story about a clever fox.")
])

for try await text in try await client.textStream(from: input) {
    print(text, terminator: "")
}

Multimodal for Image

LocalLLMClient supports multimodal models like LLaVA for processing images along with text prompts.

Using with llama.cpp
import LocalLLMClient
import LocalLLMClientLlama
import LocalLLMClientUtility

// Download model from Hugging Face (Gemma 3)
let model = "gemma-3-4b-it-Q8_0.gguf"
let mmproj = "mmproj-model-f16.gguf"

let downloader = FileDownloader(
    source: .huggingFace(id: "ggml-org/gemma-3-4b-it-GGUF", globs: [model, mmproj]),
)
try await downloader.download { print("Download: \($0)") }

// Initialize a client with the downloaded model
let client = try await LocalLLMClient.llama(
    url: downloader.destination.appending(component: model),
    mmprojURL: downloader.destination.appending(component: mmproj)
)

let input = LLMInput.chat([
    .user("What's in this image?", attachments: [.image(.init(resource: .yourImage))]),
])

// Generate text without streaming
print(try await client.generateText(from: input))
Using with Apple MLX
import LocalLLMClient
import LocalLLMClientMLX
import LocalLLMClientUtility

// Download model from Hugging Face (Qwen2.5 VL)
let downloader = FileDownloader(source: .huggingFace(
    id: "mlx-community/Qwen2.5-VL-3B-Instruct-abliterated-4bit",
    globs: .mlx
))
try await downloader.download { print("Progress: \($0)") }

let client = try await LocalLLMClient.mlx(url: downloader.destination)

let input = LLMInput.chat([
    .user("What's in this image?", attachments: [.image(.init(resource: .yourImage))]),
])

// Generate text without streaming
print(try await client.generateText(from: input))

Utility

  • FileDownloader: A utility to download models with progress tracking.

CLI tool

You can use LocalLLMClient directly from the terminal using the command line tool:

# Run using llama.cpp
swift run localllm --model /path/to/your/model.gguf "Your prompt here"

# Run using MLX
./scripts/run_mlx.sh --model https://huggingface.co/mlx-community/Qwen3-1.7B-4bit "Your prompt here"

Tested models

  • LLaMA 3
  • Gemma 3 / 2
  • Qwen 3 / 2
  • Phi 4

Models compatible with llama.cpp backend
Models compatible with MLX backend

If you have a model that works, please open an issue or PR to add it to the list.

Requirements

  • iOS 16.0+ / macOS 14.0+
  • Xcode 16.0+

Acknowledgements

This package uses llama.cpp and Apple's MLX for model inference.


Support this project ❤️

About

Swift local LLM client for iOS, macOS, Linux

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

No packages published

Contributors 2

  •  
  •