Skip to content

HaibaraAi137/HalfSeed

HalfSeed

Multi-agent LLM research orchestrator · local-first · self-hosted · emits a real paper.tex

backend frontend docker Python Node License: MIT

Watch a multi-agent AI research group debate in a pixel-art lab — and walk away with a real LaTeX paper.

HalfSeed pixel-art lab room with the Director, specialists, Skeptic, Curator, Writer, Referee, and Presentation agents at their stations — sprites animate as agents hand off artifacts to each other

HalfSeed orchestrates a small team of LLM agents (a director, specialists, a skeptic, a curator, a writer, and a referee) that propose, critique, and synthesize research artifacts on a topic you give them. The web UI shows the agents' handoffs as a live pixel-art scene, lets you approve or reject each artifact, and renders the curated output into a versioned paper.tex you can compile to PDF.

It is a working public-alpha tool for research-style explorations, and an honest probe of what current LLMs can and can't sustain over many iterations.

What you get in the browser:

Per-project agent architecture editor Research Map showing the evolving argument spine and pivots
Architecture — edit the per-project agent graph, role tags, provider routing, prompts, and handoff edges. Research Map — inspect the evolving argument spine, branch pivots, evidence ledger, and why the current direction became the main line.
Paper view rendering paper.tex with the outline sidebar Threads view grouping the agent-spawned research threads by status — open, blocked, closed-solved, closed-killed
Paperpaper.tex is regenerated every iteration with % PI-LOCK regions to preserve your hand-written prose. Threads — every open research thread the agents spawned, grouped by status; the Director picks the highest-priority open thread to attack next.

What it actually does

You ─── give a research question ───►  Director plans an iteration
                                            │
                                            ▼
                              ┌─────────────┴─────────────┐
                              ▼             ▼             ▼
                          Analytical   Numerical    Literature   ← parallel specialists
                              │             │             │
                              └─────────────┬─────────────┘
                                            ▼
                                         Skeptic    ← critiques every claim
                                            │
                                            ▼
                                         Curator   ← ranks, dedupes, drops rejected
                                            │
                                            ▼
                                          Writer   ← emits paper.tex edits
                                            │
                                            ▼
                                         Referee   ← workspace-aware paper review
                                            │                  (reads paper + figures
                                            │                   + data + citations)
                                            ▼
                                  paper.tex  +  slides.tex  +  briefing.md
                                            │
                                            ▼
                                  You approve / reject / write new directives
                                            │
                                            └────► next iteration

Every iteration is persisted in SQLite, every paper edit is git-committed, and PI-LOCK regions in paper.tex survive regeneration so your hand-written sections aren't overwritten.

Why HalfSeed and not [insert framework]

HalfSeed LangGraph / CrewAI / AutoGen
Goal A research artefact you can read and cite A reusable framework
Output LaTeX paper, slides, briefings, audit trail Whatever you build
Internal critique Built-in skeptic + curator + PI override DIY
Iteration model Stateful project with directives, threads, approvals Stateless graph runs
Visualization Architecture graph with live highlights, pixel lab replay Logs
Backend Provider-neutral: Anthropic (Claude, native), OpenAI, DeepSeek (lowest-cost default), Ollama, and any OpenAI-compatible endpoint Same

If you want a generic agent runtime, use one of the others. If you want a notebook for AI-assisted research that produces real documents, HalfSeed is shaped for that.

Quickstart

Option A: Docker (recommended)

The fastest way to try HalfSeed without installing Python, Node, or LaTeX locally.

git clone https://github.com/HaibaraAi137/HalfSeed.git
cd HalfSeed
cp .env.example .env       # add your provider API key (Claude / DeepSeek / etc.; or skip for offline mode)
docker compose up

Then open http://127.0.0.1:5173/.

Option B: Local install

You'll need Python 3.11+, Node 20+, and (optional) latexmk + xelatex for PDF compilation.

git clone https://github.com/HaibaraAi137/HalfSeed.git
cd HalfSeed
python -m pip install -e ".[sandbox]"
cp .env.example .env       # add your provider API key (Claude / DeepSeek / etc.)
( cd frontend && npm install )
./start.sh                 # starts backend on :8000 and Vite on :5173

The sandbox extra installs the scientific Python stack used by Numerical agent code artifacts (numpy, scipy, sympy, matplotlib, pandas, h5py, networkx). Docker includes the same stack automatically.

On native Windows PowerShell, run the backend and frontend in two terminals:

# Terminal 1
$env:HALFSEED_RUN_DIR = "runs"
python -m halfseed serve --host 127.0.0.1 --port 8000

# Terminal 2
cd frontend
npm.cmd install
npm.cmd run dev

If python opens the Microsoft Store or cannot be accessed, install Python 3.11+ from python.org and make sure it appears before the WindowsApps shim on PATH.

With ./start.sh, stop with ./stop.sh; logs land in logs/backend.log and logs/frontend.log. In the two-terminal PowerShell flow, stop both processes with Ctrl+C.

Pick an LLM provider

HalfSeed talks to any OpenAI-compatible /chat/completions endpoint, so you can mix and match. Set HALFSEED_PROVIDER to one of:

Provider API key needed Best for
anthropic (Claude) ANTHROPIC_API_KEY Frontier reasoning; recommended for non-trivial research questions. Native /v1/messages integration — no proxy required. Supports Claude's extended thinking.
deepseek (lowest-cost default) DEEPSEEK_API_KEY Reasoning model with thinking mode; cheapest.
openai OPENAI_API_KEY OpenAI-compatible hosted models.
ollama none — runs locally Privacy / fully offline; install Ollama then ollama pull llama3.1.

Every option only needs you to edit .env. See .env.example for ready-to-uncomment templates.

Running with Claude (recommended for harder research questions):

Just set your key in .env:

HALFSEED_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
HALFSEED_MODEL=claude-sonnet-4-6        # or claude-opus-4-7 / claude-haiku-4-5-...

HalfSeed talks to Anthropic's /v1/messages API directly — same code path as every other provider — so the multi-agent orchestration, Skeptic / Curator / Referee loop, and paper.tex output are identical. Extended thinking is auto-enabled when HALFSEED_THINKING=enabled and HALFSEED_MAX_TOKENS is large enough to leave room for the visible output.

Five-minute smoke test (no API key)

The CLI ships with a deterministic offline backend that runs the full project pipeline (agents → curator → writer → presentation) using canned responses, so you can see the shape of the output before spending a token. This produces a real project on disk:

mkdir my-runs && cd my-runs
python -m halfseed init demo "Can a scalar EFT preserve shift symmetry?"
python -m halfseed run demo --offline --no-latex-gate

# What just happened? Read the briefing the Director wrote:
python -m halfseed brief demo

# The paper draft is in runs/demo/paper.tex (offline mode fills it
# with deterministic placeholder prose so you can see the structure).
ls runs/demo/

The numbers in briefing.md come from offline canned data — confidence scores, artifact counts, and the "Stop reason" are not real research output. The point is to verify the install works and to show what artifact files end up where.

Note: the legacy one-shot command python -m halfseed --offline "<question>" prints a single markdown memo to stdout without creating a project and without running the iteration loop. It exists for quick sanity checks; for anything beyond that, use the init + run flow above.

When you're ready to run with a real model, drop the --offline flag, add your API key to .env, and re-run halfseed run demo. The project on disk is the same; only the agent backend changes.

How to use it (web UI)

  1. Start the server (./start.sh) and open http://127.0.0.1:5173/.
  2. Create a project with your research question (English or Chinese).
  3. Optionally edit architecture — drag agents around, change their missions, disable any you don't want. The "Run iteration" button is disabled if the graph has structural errors (cycle, missing required role, etc.) so you find out before launching.
  4. Run an iteration. The architecture graph highlights the active agent; the event stream shows handoffs in real time.
  5. Review artifacts: approve the good ones, reject the wrong ones (this teaches the Director to avoid dead ends next iteration).
  6. Write directives for the next round in plain text. They get fed to the Director on the next run.
  7. Compile the PDF from the Paper or PDF tab. Edit it directly inside % PI-LOCK BEGIN ... % PI-LOCK END regions and your edits survive future iterations.

Configuration

The two things that matter most:

DEEPSEEK_API_KEY              your API key
HALFSEED_TIMEOUT_SECONDS      raise this (e.g. 600+) if you use thinking mode
                              with long Chinese / multi-thread prompts

Full list:

DEEPSEEK_API_KEY              DeepSeek API key
HALFSEED_API_KEY              provider-neutral fallback (used if DEEPSEEK_API_KEY unset)
HALFSEED_BASE_URL             default: https://api.deepseek.com
HALFSEED_MODEL                default: deepseek-v4-flash
HALFSEED_THINKING             enabled | disabled
HALFSEED_TEMPERATURE          default: 0.2
HALFSEED_MAX_TOKENS           default: 32000 (0 = uncapped; let provider use its own)
HALFSEED_TIMEOUT_SECONDS      default: 60 (raise for thinking mode)
HALFSEED_MAX_RETRIES          default: 2
HALFSEED_RETRY_DELAY_SECONDS  default: 0.5

CLI flags override env vars; env vars override .env.

Documentation

  • Docs index: current docs and historical design notes.
  • Troubleshooting: setup failures, port conflicts, provider issues, LaTeX, sandbox, and stale locks.
  • Local data and security: what stays local, what is sent to live providers, where uploads and transcripts are stored.
  • Architecture: backend layers, role routing, graph semantics, persistence, and publishing.
  • Roadmap: near-term priorities and out-of-scope work.
  • Security: local trust model, provider data flow, and vulnerability reporting.
  • Changelog: public-facing changes and release notes.
  • Code of Conduct: expected behaviour and enforcement guidelines for community spaces.

Live API reference

When the backend is running (./start.sh or docker compose up), FastAPI serves an auto-generated OpenAPI explorer at http://127.0.0.1:8000/docs (Swagger UI) and the raw schema at http://127.0.0.1:8000/openapi.json. Useful for building scripts or plug-ins against the project endpoints.

What HalfSeed is not

  • Not a general-purpose agent framework. It is opinionated about research workflow shape.
  • Not a guarantee that LLM output is correct. The skeptic and curator help, but you are still the PI; review claims and approve/reject deliberately.
  • Not a SaaS — every install is a single-user local deployment. Your research stays on your machine.
  • Not real-time. DeepSeek or Claude with thinking mode enabled plus a long prompt can take 1–3 minutes per LLM call. The Stop button cancels at the next checkpoint, not mid-call.

Project layout

src/halfseed/
  agents/       role definitions + Director (plan / supervise / stop)
  llm/          backend protocol, OpenAI-compatible HTTP backend (DeepSeek / OpenAI / Ollama / ...), native Anthropic /v1/messages backend, offline backend, multi-provider router
  workflow.py   director–specialist–skeptic–curator–writer–referee orchestration
  schema.py     structured research artifacts and run records
  publish/      LaTeX/Markdown templates with PI-LOCK regions
  qa/           latexmk gate, sandbox runner
  state/        SQLite + git project lifecycle
  server/       FastAPI backend + SSE event stream
  cli.py        legacy CLI + project subcommands
frontend/
  src/          React + Vite + React Flow + PixiJS visualizer
tests/          pytest suite (unit + server route + sandbox)

Contributing

Issues and pull requests are welcome. See CONTRIBUTING.md for what is in scope, what is not, and how to test changes locally.

License

MIT — see LICENSE.