A tool for semi-autonomous scientific research and discovery.
Catalyst provides three main workflows:
- Autonomously develop a theory to explain a given phenomenon
- Take a user-provided theory draft, fill in any gaps, and auto-correct mistakes and oversights
- A menu of pre-defined operations to choose from: Review a theory for correctness, propose corrections and refinements, perform experimental validations, etc.
Catalyst helps develop explanations for observable phenomena.
Suitable problems for the autonomous theory development workflow are of the shape:
- "When we do Y, we observe X. What is the mechanism that causes X?"
- "We sometimes see X while doing Y. Under what conditions does X happen, and why?"
- "Explain what happens when we do X."
In short, it aims to answer "Why" questions that lead to testable predictions.
The current implementation of Catalyst is designed to work for problems that can be understood through computational experiments and mathematical derivation. Our testing so far has been limited to problems in the field of machine learning / deep learning theory.
You will have a higher chance of success if:
- The phenomenon is described precisely and with little room for interpretation
- You're able to provide simplifying assumptions or limit the scope of the investigation upfront. E.g. "Only consider linear networks with the following loss function: ..."
- The phenomenon can be reproduced and probed at through programmatic experimentation, i.e. reproduced by a piece of Python code that you can run on your computer.
- You can describe the shape of the explanation that you are looking for. E.g. "I'm looking for an analytical explanation that makes exact predictions about property Y.", or "I'm looking for an empirically validated approximation that holds in the value range A-B.
Catalyst is not a good fit for:
- Optimization problems, e.g. "find the optimal hyperparameters for training this ANN", "discover a more optimal matrix multiplication algorithm", or "find a function that maximizes metric X"
- Problems with subjective or under-specified success criteria, e.g. "develop a theoretical framework for overfitting in deep learning"
- Engineering problems, e.g. "build an operating system for microcontrollers", "design an efficient HTML rendering engine"
- Problems that are significantly out of reach for the underlying base model, e.g. "Prove or disprove P=NP", "Unify quantum physics and general relativity into a practically testable theory of everything"
- Problems that require experiments that can't be run on a computer (life sciences, psychology, experimental physics, etc.)
- Problems that require significant computational resources to solve. Catalyst limits the runtime of any single experiment to no more than 30 minutes. Furthermore, it presently does not particularly optimize for compute and/or experiment efficiency.
Catalyst does not replace LLM Chat interfaces or off-the-shelf coding agents. Those remain a better fit for interactive, conversational exploration of a topic, and for any problems that don't fit the criteria mentioned above.
While Catalyst is built on top of those same LLMs, it adds unique techniques that allow it to produce results beyond the capabilities of the raw model and harness:
- Catalyst implements adversarial review-refinement loops: One set of agents continuously improves the generated theory, while separate, independent agents are tasked with falsifying its statements and identifying its limits.
- Catalyst deploys an evolution-inspired system to build a population of competing theories. The theories are repeatedly ranked against each other and checked against empirical data. The most promising theories are selected for further refinement.
- Before using Catalyst, carefully review the "Supported Models & Estimated Costs" section below.
git clone https://github.com/imbue-ai/catalyst.git && cd catalystgit checkout stableto use the stable branch- Install prerequisites
cd src && ./run.sh- Follow the Quickstart Guide for next steps.
Catalyst utilizes an existing agentic harness installed on your system. It currently supports the following harnesses:
- Claude Code (either via
claude -por via mngr) - Gemini CLI (via
gemini -p) - Antigravity CLI (via
agy -p) - Codex CLI (via
codex exec)
Token usage will be billed directly by the provider (Anthropic, Google, or OpenAI), based on the harness' existing authentication.
Before using Catalyst, please familiarize yourself with the expected costs listed below. The evolution-based workflows in particular are frequently composed of >100 subagents, and can incur significant token usage.
Tip
About 65% of tokens in a typical Develop Theory workflow are used for review & scoring steps, 25% for theory development, and 10% for miscellaneous. You can reduce your cost by configuring "Step Type Model Overrides", and using the strongest model only for theory development steps. Review & scoring and miscellaneous steps can often work with a slightly weaker model without significantly impacting the quality of your results.
The costs shown below are rough estimates (order of magnitude), and will vary significantly depending on your research task. Even when using a subscription, extra charges may apply after you exhaust your plan's rate limits depending on your configuration (Anthropic Usage Credits, Gemini AI Credits etc.). Please monitor your provider's spend dashboard to avoid unwanted surprises.
| Harness | Can use subscription plan? | Runs in sandbox | Model | Cost per "Develop Theory (Evolution)" | Cost per "Develop Theory (Linear)" | Cost per manual step |
|---|---|---|---|---|---|---|
| Claude Code | No | Yes | Opus 4.8 | ~$1,000 USD | ~$200 USD | ~$20 USD |
| Sonnet 4.6 | ~$500 USD | ~$100 USD | ~$10 USD | |||
| Haiku 4.5 | ~$150 USD | ~$30 USD | ~$3 USD | |||
| Claude Code (via mngr) | Yes, Max 20x recommended | Yes | Opus 4.8 | included in subscription (1-2 per week with Max 20x); ~$1,000 USD when using API billing | included in subscription (~5 per week with Max 20x); ~$200 USD when using API billing | included in subscription; ~$20 USD when using API billing |
| Sonnet 4.6 | included in subscription; ~$500 USD when using API billing | included in subscription; ~$100 USD when using API billing | included in subscription; ~$10 when using API billing | |||
| Haiku 4.5 | included in subscription; ~$150 when using API billing | included in subscription; ~$30 USD when using API billing | included in subscription; ~$3 USD when using API billing | |||
| Gemini CLI | No | Yes | 3.5 Flash | ~$200 USD | ~$40 USD | ~$4 USD |
| 3.1 Pro | ~$300 USD | ~$60 USD | ~$6 USD | |||
| 3 Flash | ~$100 USD | ~$20 USD | ~$2 USD | |||
| Antigravity CLI | Yes, AI Ultra recommended | No | 3.5 Flash | included in subscription; ~$200 USD when using API billing | included in subscription; ~$40 USD when using API billing | included in subscription; ~$4 USD when using API billing |
| 3.1 Pro | included in subscription; ~$300 USD when using API billing | included in subscription; ~$60 USD when using API billing | included in subscription; ~$6 USD when using API billing | |||
| Codex CLI | Yes, Pro 20x recommended | Yes | GPT 5.5 | included in subscription; ~$500 USD when using API billing | included in subscription; ~$100 USD when using API billing | included in subscription; ~$10 USD when using API billing |
Additional information can be found in the following guides:
- Setup: Prerequisites, setup & troubleshooting instructions.
- Quickstart Guide: An overview of the system structure and how to run you research.
- Mid-Research Steering: How to steer the direction of an ongoing research task
- Workflows and Add-ons: A reference for all primary workflows and individual add-on steps.
- CLI Agent Usage: Instructions for using AI Scientist skills directly within a CLI agent.
Catalyst is built by your friends at Imbue:
