Catalyst AI Scientist

A tool for semi-autonomous scientific research and discovery.

What Catalyst Can Do

Catalyst provides three main workflows:

Autonomously develop a theory to explain a given phenomenon
Take a user-provided theory draft, fill in any gaps, and auto-correct mistakes and oversights
A menu of pre-defined operations to choose from: Review a theory for correctness, propose corrections and refinements, perform experimental validations, etc.

Suitable Problems

Catalyst helps develop explanations for observable phenomena.

Suitable problems for the autonomous theory development workflow are of the shape:

"When we do Y, we observe X. What is the mechanism that causes X?"
"We sometimes see X while doing Y. Under what conditions does X happen, and why?"
"Explain what happens when we do X."

In short, it aims to answer "Why" questions that lead to testable predictions.

The current implementation of Catalyst is designed to work for problems that can be understood through computational experiments and mathematical derivation. Our testing so far has been limited to problems in the field of machine learning / deep learning theory.

You will have a higher chance of success if:

The phenomenon is described precisely and with little room for interpretation
You're able to provide simplifying assumptions or limit the scope of the investigation upfront. E.g. "Only consider linear networks with the following loss function: ..."
The phenomenon can be reproduced and probed at through programmatic experimentation, i.e. reproduced by a piece of Python code that you can run on your computer.
You can describe the shape of the explanation that you are looking for. E.g. "I'm looking for an analytical explanation that makes exact predictions about property Y.", or "I'm looking for an empirically validated approximation that holds in the value range A-B.

What Catalyst is Not a Good Fit For

Catalyst is not a good fit for:

Optimization problems, e.g. "find the optimal hyperparameters for training this ANN", "discover a more optimal matrix multiplication algorithm", or "find a function that maximizes metric X"
Problems with subjective or under-specified success criteria, e.g. "develop a theoretical framework for overfitting in deep learning"
Engineering problems, e.g. "build an operating system for microcontrollers", "design an efficient HTML rendering engine"
Problems that are significantly out of reach for the underlying base model, e.g. "Prove or disprove P=NP", "Unify quantum physics and general relativity into a practically testable theory of everything"
Problems that require experiments that can't be run on a computer (life sciences, psychology, experimental physics, etc.)
Problems that require significant computational resources to solve. Catalyst limits the runtime of any single experiment to no more than 30 minutes. Furthermore, it presently does not particularly optimize for compute and/or experiment efficiency.

Why Choose Catalyst Over a Bare LLM Chat or Coding Agent?

Catalyst does not replace LLM Chat interfaces or off-the-shelf coding agents. Those remain a better fit for interactive, conversational exploration of a topic, and for any problems that don't fit the criteria mentioned above.

While Catalyst is built on top of those same LLMs, it adds unique techniques that allow it to produce results beyond the capabilities of the raw model and harness:

Catalyst implements adversarial review-refinement loops: One set of agents continuously improves the generated theory, while separate, independent agents are tasked with falsifying its statements and identifying its limits.
Catalyst deploys an evolution-inspired system to build a population of competing theories. The theories are repeatedly ranked against each other and checked against empirical data. The most promising theories are selected for further refinement.

Getting Started

Before using Catalyst, carefully review the "Supported Models & Estimated Costs" section below.
git clone https://github.com/imbue-ai/catalyst.git && cd catalyst
git checkout stable to use the stable branch
Install prerequisites
cd src && ./run.sh
Follow the Quickstart Guide for next steps.

Supported Models & Estimated Costs

Catalyst utilizes an existing agentic harness installed on your system. It currently supports the following harnesses:

Claude Code (either via claude -p or via mngr)
Gemini CLI (via gemini -p)
Antigravity CLI (via agy -p)
Codex CLI (via codex exec)

Token usage will be billed directly by the provider (Anthropic, Google, or OpenAI), based on the harness' existing authentication.

Before using Catalyst, please familiarize yourself with the expected costs listed below. The evolution-based workflows in particular are frequently composed of >100 subagents, and can incur significant token usage.

Tip

About 65% of tokens in a typical Develop Theory workflow are used for review & scoring steps, 25% for theory development, and 10% for miscellaneous. You can reduce your cost by configuring "Step Type Model Overrides", and using the strongest model only for theory development steps. Review & scoring and miscellaneous steps can often work with a slightly weaker model without significantly impacting the quality of your results.

The costs shown below are rough estimates (order of magnitude), and will vary significantly depending on your research task. Even when using a subscription, extra charges may apply after you exhaust your plan's rate limits depending on your configuration (Anthropic Usage Credits, Gemini AI Credits etc.). Please monitor your provider's spend dashboard to avoid unwanted surprises.

Harness	Can use subscription plan?	Runs in sandbox	Model	Cost per "Develop Theory (Evolution)"	Cost per "Develop Theory (Linear)"	Cost per manual step
Claude Code	No	Yes	Opus 4.8	~$1,000 USD	~$200 USD	~$20 USD
			Sonnet 4.6	~$500 USD	~$100 USD	~$10 USD
			Haiku 4.5	~$150 USD	~$30 USD	~$3 USD
Claude Code (via mngr)	Yes, Max 20x recommended	Yes	Opus 4.8	included in subscription (1-2 per week with Max 20x); ~$1,000 USD when using API billing	included in subscription (~5 per week with Max 20x); ~$200 USD when using API billing	included in subscription; ~$20 USD when using API billing
			Sonnet 4.6	included in subscription; ~$500 USD when using API billing	included in subscription; ~$100 USD when using API billing	included in subscription; ~$10 when using API billing
			Haiku 4.5	included in subscription; ~$150 when using API billing	included in subscription; ~$30 USD when using API billing	included in subscription; ~$3 USD when using API billing
Gemini CLI	No	Yes	3.5 Flash	~$200 USD	~$40 USD	~$4 USD
			3.1 Pro	~$300 USD	~$60 USD	~$6 USD
			3 Flash	~$100 USD	~$20 USD	~$2 USD
Antigravity CLI	Yes, AI Ultra recommended	No	3.5 Flash	included in subscription; ~$200 USD when using API billing	included in subscription; ~$40 USD when using API billing	included in subscription; ~$4 USD when using API billing
			3.1 Pro	included in subscription; ~$300 USD when using API billing	included in subscription; ~$60 USD when using API billing	included in subscription; ~$6 USD when using API billing
Codex CLI	Yes, Pro 20x recommended	Yes	GPT 5.5	included in subscription; ~$500 USD when using API billing	included in subscription; ~$100 USD when using API billing	included in subscription; ~$10 USD when using API billing

Further Documentation

Additional information can be found in the following guides:

Setup: Prerequisites, setup & troubleshooting instructions.
Quickstart Guide: An overview of the system structure and how to run you research.
Mid-Research Steering: How to steer the direction of an ongoing research task
Workflows and Add-ons: A reference for all primary workflows and individual add-on steps.
CLI Agent Usage: Instructions for using AI Scientist skills directly within a CLI agent.

Contributors

Catalyst is built by your friends at Imbue:

Name		Name	Last commit message	Last commit date
Latest commit History 612 Commits
.github/workflows		.github/workflows
.mngr		.mngr
darwinian_evolver @ 7f12365		darwinian_evolver @ 7f12365
src		src
templates @ 24d882e		templates @ 24d882e
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Catalyst AI Scientist

What Catalyst Can Do

Suitable Problems

What Catalyst is Not a Good Fit For

Why Choose Catalyst Over a Bare LLM Chat or Coding Agent?

Getting Started

Supported Models & Estimated Costs

Further Documentation

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Catalyst AI Scientist

What Catalyst Can Do

Suitable Problems

What Catalyst is Not a Good Fit For

Why Choose Catalyst Over a Bare LLM Chat or Coding Agent?

Getting Started

Supported Models & Estimated Costs

Further Documentation

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages