LlamaBarn is a tiny menu bar app that lets you install and run local LLMs with just a few clicks. It automatically configures each model to run optimally on your Mac, and exposes a standard API that any app can connect to.
Install with brew install --cask llamabarn or download from Releases ↗
LlamaBarn runs as a tiny menu bar app on your Mac.
- Install a model from the built-in catalog -- only models that can run on your Mac are shown
- Select an installed model to run it -- configures and starts a server at
http://localhost:2276 - Use the running model via the API or web UI -- both at
http://localhost:2276
Under the hood LlamaBarn uses llama.cpp and runs models with no external dependencies.
Connect to any app that supports custom APIs:
- chat UIs like
ChatboxorOpen WebUI - CLI assistants like
OpenCodeorCodex - editors like
VS CodeorZed - editor extensions like
ClineorContinue - custom scripts using
curlor libs likeai sdk
Or use the built-in web UI at http://localhost:2276 to chat with the running model directly.
LlamaBarn builds on the llama.cpp server and supports the same API endpoints:
# say "Hello" to the running model
curl http://localhost:2276/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}]}'See complete reference in llama-server docs ↗
