Skip to content

imbue-ai/catalyst

Repository files navigation

Catalyst AI Scientist

A tool for semi-autonomous scientific research and discovery.

Catalyst Logo

What Catalyst Can Do

Catalyst provides three main workflows:

  1. Autonomously develop a theory to explain a given phenomenon
  2. Take a user-provided theory draft, fill in any gaps, and auto-correct mistakes and oversights
  3. A menu of pre-defined operations to choose from: Review a theory for correctness, propose corrections and refinements, perform experimental validations, etc.

Suitable Problems

Catalyst helps develop explanations for observable phenomena.

Suitable problems for the autonomous theory development workflow are of the shape:

  • "When we do Y, we observe X. What is the mechanism that causes X?"
  • "We sometimes see X while doing Y. Under what conditions does X happen, and why?"
  • "Explain what happens when we do X."

In short, it aims to answer "Why" questions that lead to testable predictions.

The current implementation of Catalyst is designed to work for problems that can be understood through computational experiments and mathematical derivation. Our testing so far has been limited to problems in the field of machine learning / deep learning theory.

You will have a higher chance of success if:

  • The phenomenon is described precisely and with little room for interpretation
  • You're able to provide simplifying assumptions or limit the scope of the investigation upfront. E.g. "Only consider linear networks with the following loss function: ..."
  • The phenomenon can be reproduced and probed at through programmatic experimentation, i.e. reproduced by a piece of Python code that you can run on your computer.
  • You can describe the shape of the explanation that you are looking for. E.g. "I'm looking for an analytical explanation that makes exact predictions about property Y.", or "I'm looking for an empirically validated approximation that holds in the value range A-B.

What Catalyst is Not a Good Fit For

Catalyst is not a good fit for:

  • Optimization problems, e.g. "find the optimal hyperparameters for training this ANN", "discover a more optimal matrix multiplication algorithm", or "find a function that maximizes metric X"
  • Problems with subjective or under-specified success criteria, e.g. "develop a theoretical framework for overfitting in deep learning"
  • Engineering problems, e.g. "build an operating system for microcontrollers", "design an efficient HTML rendering engine"
  • Problems that are significantly out of reach for the underlying base model, e.g. "Prove or disprove P=NP", "Unify quantum physics and general relativity into a practically testable theory of everything"
  • Problems that require experiments that can't be run on a computer (life sciences, psychology, experimental physics, etc.)
  • Problems that require significant computational resources to solve. Catalyst limits the runtime of any single experiment to no more than 30 minutes. Furthermore, it presently does not particularly optimize for compute and/or experiment efficiency.

Why Choose Catalyst Over a Bare LLM Chat or Coding Agent?

Catalyst does not replace LLM Chat interfaces or off-the-shelf coding agents. Those remain a better fit for interactive, conversational exploration of a topic, and for any problems that don't fit the criteria mentioned above.

While Catalyst is built on top of those same LLMs, it adds unique techniques that allow it to produce results beyond the capabilities of the raw model and harness:

  • Catalyst implements adversarial review-refinement loops: One set of agents continuously improves the generated theory, while separate, independent agents are tasked with falsifying its statements and identifying its limits.
  • Catalyst deploys an evolution-inspired system to build a population of competing theories. The theories are repeatedly ranked against each other and checked against empirical data. The most promising theories are selected for further refinement.

Getting Started

  1. Before using Catalyst, carefully review the "Supported Models & Estimated Costs" section below.
  2. git clone https://github.com/imbue-ai/catalyst.git && cd catalyst
  3. git checkout stable to use the stable branch
  4. Install prerequisites
  5. cd src && ./run.sh
  6. Follow the Quickstart Guide for next steps.

Supported Models & Estimated Costs

Catalyst utilizes an existing agentic harness installed on your system. It currently supports the following harnesses:

  • Claude Code (either via claude -p or via mngr)
  • Gemini CLI (via gemini -p)
  • Antigravity CLI (via agy -p)
  • Codex CLI (via codex exec)

Token usage will be billed directly by the provider (Anthropic, Google, or OpenAI), based on the harness' existing authentication.

Before using Catalyst, please familiarize yourself with the expected costs listed below. The evolution-based workflows in particular are frequently composed of >100 subagents, and can incur significant token usage.

Tip

About 65% of tokens in a typical Develop Theory workflow are used for review & scoring steps, 25% for theory development, and 10% for miscellaneous. You can reduce your cost by configuring "Step Type Model Overrides", and using the strongest model only for theory development steps. Review & scoring and miscellaneous steps can often work with a slightly weaker model without significantly impacting the quality of your results.

The costs shown below are rough estimates (order of magnitude), and will vary significantly depending on your research task. Even when using a subscription, extra charges may apply after you exhaust your plan's rate limits depending on your configuration (Anthropic Usage Credits, Gemini AI Credits etc.). Please monitor your provider's spend dashboard to avoid unwanted surprises.

Harness Can use subscription plan? Runs in sandbox Model Cost per "Develop Theory (Evolution)" Cost per "Develop Theory (Linear)" Cost per manual step
Claude Code No Yes Opus 4.8 ~$1,000 USD ~$200 USD ~$20 USD
Sonnet 4.6 ~$500 USD ~$100 USD ~$10 USD
Haiku 4.5 ~$150 USD ~$30 USD ~$3 USD
Claude Code (via mngr) Yes, Max 20x recommended Yes Opus 4.8 included in subscription (1-2 per week with Max 20x); ~$1,000 USD when using API billing included in subscription (~5 per week with Max 20x); ~$200 USD when using API billing included in subscription; ~$20 USD when using API billing
Sonnet 4.6 included in subscription; ~$500 USD when using API billing included in subscription; ~$100 USD when using API billing included in subscription; ~$10 when using API billing
Haiku 4.5 included in subscription; ~$150 when using API billing included in subscription; ~$30 USD when using API billing included in subscription; ~$3 USD when using API billing
Gemini CLI No Yes 3.5 Flash ~$200 USD ~$40 USD ~$4 USD
3.1 Pro ~$300 USD ~$60 USD ~$6 USD
3 Flash ~$100 USD ~$20 USD ~$2 USD
Antigravity CLI Yes, AI Ultra recommended No 3.5 Flash included in subscription; ~$200 USD when using API billing included in subscription; ~$40 USD when using API billing included in subscription; ~$4 USD when using API billing
3.1 Pro included in subscription; ~$300 USD when using API billing included in subscription; ~$60 USD when using API billing included in subscription; ~$6 USD when using API billing
Codex CLI Yes, Pro 20x recommended Yes GPT 5.5 included in subscription; ~$500 USD when using API billing included in subscription; ~$100 USD when using API billing included in subscription; ~$10 USD when using API billing

Further Documentation

Additional information can be found in the following guides:

  • Setup: Prerequisites, setup & troubleshooting instructions.
  • Quickstart Guide: An overview of the system structure and how to run you research.
  • Mid-Research Steering: How to steer the direction of an ongoing research task
  • Workflows and Add-ons: A reference for all primary workflows and individual add-on steps.
  • CLI Agent Usage: Instructions for using AI Scientist skills directly within a CLI agent.

Contributors

Catalyst is built by your friends at Imbue:

About

A tool for semi-autonomous scientific research and discovery.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors