Multi-agent LLM research orchestrator · local-first · self-hosted · emits a real paper.tex
Watch a multi-agent AI research group debate in a pixel-art lab — and walk away with a real LaTeX paper.
HalfSeed orchestrates a small team of LLM agents (a director, specialists, a skeptic, a curator, a writer, and a referee) that propose, critique, and synthesize research artifacts on a topic you give them. The web UI shows the agents' handoffs as a live pixel-art scene, lets you approve or reject each artifact, and renders the curated output into a versioned paper.tex you can compile to PDF.
It is a working public-alpha tool for research-style explorations, and an honest probe of what current LLMs can and can't sustain over many iterations.
What you get in the browser:
You ─── give a research question ───► Director plans an iteration
│
▼
┌─────────────┴─────────────┐
▼ ▼ ▼
Analytical Numerical Literature ← parallel specialists
│ │ │
└─────────────┬─────────────┘
▼
Skeptic ← critiques every claim
│
▼
Curator ← ranks, dedupes, drops rejected
│
▼
Writer ← emits paper.tex edits
│
▼
Referee ← workspace-aware paper review
│ (reads paper + figures
│ + data + citations)
▼
paper.tex + slides.tex + briefing.md
│
▼
You approve / reject / write new directives
│
└────► next iteration
Every iteration is persisted in SQLite, every paper edit is git-committed, and PI-LOCK regions in paper.tex survive regeneration so your hand-written sections aren't overwritten.
| HalfSeed | LangGraph / CrewAI / AutoGen | |
|---|---|---|
| Goal | A research artefact you can read and cite | A reusable framework |
| Output | LaTeX paper, slides, briefings, audit trail | Whatever you build |
| Internal critique | Built-in skeptic + curator + PI override | DIY |
| Iteration model | Stateful project with directives, threads, approvals | Stateless graph runs |
| Visualization | Architecture graph with live highlights, pixel lab replay | Logs |
| Backend | Provider-neutral: Anthropic (Claude, native), OpenAI, DeepSeek (lowest-cost default), Ollama, and any OpenAI-compatible endpoint | Same |
If you want a generic agent runtime, use one of the others. If you want a notebook for AI-assisted research that produces real documents, HalfSeed is shaped for that.
The fastest way to try HalfSeed without installing Python, Node, or LaTeX locally.
git clone https://github.com/HaibaraAi137/HalfSeed.git
cd HalfSeed
cp .env.example .env # add your provider API key (Claude / DeepSeek / etc.; or skip for offline mode)
docker compose upThen open http://127.0.0.1:5173/.
You'll need Python 3.11+, Node 20+, and (optional) latexmk + xelatex for PDF compilation.
git clone https://github.com/HaibaraAi137/HalfSeed.git
cd HalfSeed
python -m pip install -e ".[sandbox]"
cp .env.example .env # add your provider API key (Claude / DeepSeek / etc.)
( cd frontend && npm install )
./start.sh # starts backend on :8000 and Vite on :5173The sandbox extra installs the scientific Python stack used by Numerical
agent code artifacts (numpy, scipy, sympy, matplotlib, pandas,
h5py, networkx). Docker includes the same stack automatically.
On native Windows PowerShell, run the backend and frontend in two terminals:
# Terminal 1
$env:HALFSEED_RUN_DIR = "runs"
python -m halfseed serve --host 127.0.0.1 --port 8000
# Terminal 2
cd frontend
npm.cmd install
npm.cmd run devIf python opens the Microsoft Store or cannot be accessed, install Python
3.11+ from python.org and make sure it appears before the WindowsApps shim
on PATH.
With ./start.sh, stop with ./stop.sh; logs land in logs/backend.log
and logs/frontend.log. In the two-terminal PowerShell flow, stop both
processes with Ctrl+C.
HalfSeed talks to any OpenAI-compatible /chat/completions endpoint, so you can mix and match. Set HALFSEED_PROVIDER to one of:
| Provider | API key needed | Best for |
|---|---|---|
anthropic (Claude) |
ANTHROPIC_API_KEY |
Frontier reasoning; recommended for non-trivial research questions. Native /v1/messages integration — no proxy required. Supports Claude's extended thinking. |
deepseek (lowest-cost default) |
DEEPSEEK_API_KEY |
Reasoning model with thinking mode; cheapest. |
openai |
OPENAI_API_KEY |
OpenAI-compatible hosted models. |
ollama |
none — runs locally | Privacy / fully offline; install Ollama then ollama pull llama3.1. |
Every option only needs you to edit .env. See .env.example for ready-to-uncomment templates.
Running with Claude (recommended for harder research questions):
Just set your key in .env:
HALFSEED_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
HALFSEED_MODEL=claude-sonnet-4-6 # or claude-opus-4-7 / claude-haiku-4-5-...
HalfSeed talks to Anthropic's /v1/messages API directly — same code path as every other provider — so the multi-agent orchestration, Skeptic / Curator / Referee loop, and paper.tex output are identical. Extended thinking is auto-enabled when HALFSEED_THINKING=enabled and HALFSEED_MAX_TOKENS is large enough to leave room for the visible output.
The CLI ships with a deterministic offline backend that runs the full project pipeline (agents → curator → writer → presentation) using canned responses, so you can see the shape of the output before spending a token. This produces a real project on disk:
mkdir my-runs && cd my-runs
python -m halfseed init demo "Can a scalar EFT preserve shift symmetry?"
python -m halfseed run demo --offline --no-latex-gate
# What just happened? Read the briefing the Director wrote:
python -m halfseed brief demo
# The paper draft is in runs/demo/paper.tex (offline mode fills it
# with deterministic placeholder prose so you can see the structure).
ls runs/demo/The numbers in briefing.md come from offline canned data — confidence
scores, artifact counts, and the "Stop reason" are not real research
output. The point is to verify the install works and to show what
artifact files end up where.
Note: the legacy one-shot command
python -m halfseed --offline "<question>"prints a single markdown memo to stdout without creating a project and without running the iteration loop. It exists for quick sanity checks; for anything beyond that, use theinit+runflow above.
When you're ready to run with a real model, drop the --offline flag,
add your API key to .env, and re-run halfseed run demo. The project
on disk is the same; only the agent backend changes.
- Start the server (
./start.sh) and open http://127.0.0.1:5173/. - Create a project with your research question (English or Chinese).
- Optionally edit architecture — drag agents around, change their missions, disable any you don't want. The "Run iteration" button is disabled if the graph has structural errors (cycle, missing required role, etc.) so you find out before launching.
- Run an iteration. The architecture graph highlights the active agent; the event stream shows handoffs in real time.
- Review artifacts: approve the good ones, reject the wrong ones (this teaches the Director to avoid dead ends next iteration).
- Write directives for the next round in plain text. They get fed to the Director on the next run.
- Compile the PDF from the Paper or PDF tab. Edit it directly inside
% PI-LOCK BEGIN ... % PI-LOCK ENDregions and your edits survive future iterations.
The two things that matter most:
DEEPSEEK_API_KEY your API key
HALFSEED_TIMEOUT_SECONDS raise this (e.g. 600+) if you use thinking mode
with long Chinese / multi-thread prompts
Full list:
DEEPSEEK_API_KEY DeepSeek API key
HALFSEED_API_KEY provider-neutral fallback (used if DEEPSEEK_API_KEY unset)
HALFSEED_BASE_URL default: https://api.deepseek.com
HALFSEED_MODEL default: deepseek-v4-flash
HALFSEED_THINKING enabled | disabled
HALFSEED_TEMPERATURE default: 0.2
HALFSEED_MAX_TOKENS default: 32000 (0 = uncapped; let provider use its own)
HALFSEED_TIMEOUT_SECONDS default: 60 (raise for thinking mode)
HALFSEED_MAX_RETRIES default: 2
HALFSEED_RETRY_DELAY_SECONDS default: 0.5
CLI flags override env vars; env vars override .env.
- Docs index: current docs and historical design notes.
- Troubleshooting: setup failures, port conflicts, provider issues, LaTeX, sandbox, and stale locks.
- Local data and security: what stays local, what is sent to live providers, where uploads and transcripts are stored.
- Architecture: backend layers, role routing, graph semantics, persistence, and publishing.
- Roadmap: near-term priorities and out-of-scope work.
- Security: local trust model, provider data flow, and vulnerability reporting.
- Changelog: public-facing changes and release notes.
- Code of Conduct: expected behaviour and enforcement guidelines for community spaces.
When the backend is running (./start.sh or docker compose up), FastAPI
serves an auto-generated OpenAPI explorer at
http://127.0.0.1:8000/docs (Swagger UI) and the raw schema at
http://127.0.0.1:8000/openapi.json. Useful for building scripts or
plug-ins against the project endpoints.
- Not a general-purpose agent framework. It is opinionated about research workflow shape.
- Not a guarantee that LLM output is correct. The skeptic and curator help, but you are still the PI; review claims and approve/reject deliberately.
- Not a SaaS — every install is a single-user local deployment. Your research stays on your machine.
- Not real-time. DeepSeek or Claude with thinking mode enabled plus a long prompt can take 1–3 minutes per LLM call. The Stop button cancels at the next checkpoint, not mid-call.
src/halfseed/
agents/ role definitions + Director (plan / supervise / stop)
llm/ backend protocol, OpenAI-compatible HTTP backend (DeepSeek / OpenAI / Ollama / ...), native Anthropic /v1/messages backend, offline backend, multi-provider router
workflow.py director–specialist–skeptic–curator–writer–referee orchestration
schema.py structured research artifacts and run records
publish/ LaTeX/Markdown templates with PI-LOCK regions
qa/ latexmk gate, sandbox runner
state/ SQLite + git project lifecycle
server/ FastAPI backend + SSE event stream
cli.py legacy CLI + project subcommands
frontend/
src/ React + Vite + React Flow + PixiJS visualizer
tests/ pytest suite (unit + server route + sandbox)
Issues and pull requests are welcome. See CONTRIBUTING.md for what is in scope, what is not, and how to test changes locally.
MIT — see LICENSE.




