Holy cow! Now you can talk back to the cow!
Cow is just an humble AI for your computer. ๐ฅบ
cow_demo-converted.mp4
Cow allows you to interact with a local language model, free of charge, as much as you could possibly want โย all from the comfort of your own home terminal.
Note
Cow supports ๐ Apple Silicon and ๐ง Linux x64.
curl -fsSL https://raw.githubusercontent.com/jolexxa/cow/main/install.sh | bashThis downloads the latest release for your platform and installs it to ~/.local/bin/.
Tip
The first time you run Cow, it will download the required model files automatically from Hugging Face.
Cow supports two inference backends:
- llama.cpp via llama_cpp_dart โ runs GGUF models on CPU or GPU. Llama.cpp is cross platform and works just about anywhere.
- MLX via cow_mlx + mlx_dart โ runs MLX-format models natively on Apple Silicon. MLX tends to outperform llama.cpp by almost an order of magnitude or more on Apple Silicon hardware.
A higher-level package called cow_brain wraps both backends behind a common InferenceRuntime interface and enables high-level agentic functionality (reasoning, tool use, context management).
On Apple Silicon, Cow uses MLX with Qwen 3-8B 4-bit for primary interactions and Qwen 2.5-3B Instruct 4-bit for lightweight summarization. On Linux, Cow uses llama.cpp with the equivalent GGUF models.
Cow cannot support arbitrary models. Most models require prompts to follow a specific template, usually provided as jinja code.
Tip
Since Cow tries to avoid re-tokenizing the message history on each interaction, one would need to implement the template for any new models in native Dart code. This may involve writing a prompt formatter, stream parser, and/or tool call extractor before hooking it up in the model profiles, app model profiles, and main app.
One could change the context size in AppInfo. One should be sure they have enough memory to support the context size they choose or they might just put their computer out to pasture.
Cow is currently a monorepo. All packages live under packages/.
| Package | Description |
|---|---|
| cow | Main terminal application โ orchestrates backends, UI, and model management |
| cow_brain | Agentic inference layer โ reasoning, tool use, context management, and a common InferenceRuntime interface |
| cow_model_manager | Model installer โ downloads and manages LLM model files |
| llama_cpp_dart | Dart FFI bindings for llama.cpp |
| cow_mlx | MLX Swift inference backend (macOS only) โ built separately via Xcode |
| mlx_dart | Dart FFI bindings for cow_mlx |
| blocterm | Bridges bloc and nocterm for reactive terminal UIs |
| logic_blocks | Human-friendly hierarchical state machines for Dart |
| collections | Utility collection types used across packages |
For a beautiful Terminal UI (TUI), Cow uses nocterm. Nocterm is also still in active development. Cow introduces a package called blocterm to enable bloc to be used as if it were a typical Flutter application.
Cow-related contributions to Nocterm:
- fix: quantize colors in environments without true color support
- feat: text selection
- fix: scrollbar position
- fix: render object attach
- Dart SDK (the easiest way is to use FVM to install Flutter, which includes Dart โ without a version manager, you'll end up in a stampede)
- Xcode (macOS only โ required to build the MLX Swift library and compile Metal shaders)
Cow includes llama.cpp as a git submodule (used for FFI bindings). The --recursive flag pulls it in automatically.
git clone --recursive https://github.com/jolexxa/cow.git
cd cowCow is a monorepo with multiple Dart packages under packages/.
dart tool/pub_get.dartDownloads prebuilt llama.cpp binaries for your platform (macOS ARM64 or Linux x64) and places them in packages/llama_cpp_dart/assets/native/.
dart tool/download_llama_assets.dartBuilds the CowMLX Swift dynamic library. This requires Xcode (not just the command-line tools) because MLX uses Metal shaders that SwiftPM alone can't compile.
dart tool/build_mlx.dartdart run packages/cow/bin/cow.dartAll scripts are in ./tool/ and most accept an optional package name (e.g., cow_brain, blocterm).
dart tool/pub_get.dart [pkg] # dart pub get (one or all)
dart tool/test.dart [pkg] # run Dart tests (one or all)
dart tool/analyze.dart [pkg] # dart analyze --fatal-infos
dart tool/format.dart [pkg] # dart format (add --check for CI mode)
dart tool/coverage.dart [pkg] # tests + lcov coverage report
dart tool/codegen.dart [pkg] # build_runner / ffigen code generation
dart tool/build_mlx.dart # build CowMLX Swift library
dart tool/checks.dart # full CI check (format โ analyze โ build โ test โ coverage)Cow treats "model profiles" as the wiring layer between raw inference output and the app's message/tool semantics. Each profile defines three pieces:
- Prompt formatter โ converts messages into a token sequence matching the model's chat template
- Stream parser โ converts raw streamed tokens into structured
ModelOutput - Tool parser โ extracts tool calls from model text output
Profiles live in:
- cow_brain model profiles โ runtime behavior
- app model profiles โ which models the app ships
- cow_model_manager โ model file specs, registry, and installer logic
To add a new local model, implement the formatter/parser/extractor as needed, register the profile, and add tests. Profiles are thin wiring by design โ keep logic in the formatter/parser classes and keep the profile declarations mostly declarative.
Cow is grateful to Alibaba Cloud for releasing the Qwen models under the permissive Apache 2.0 license. See the credits for the full license.
Cow itself is licensed under the permissive MIT license. Yee-haw!