GitHub - oripress/AlgoTune: AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each problem, and is faster than existing implementations.

Website | Paper

How good are language models at coming up with new algorithms? To try to answer this, we built a benchmark, AlgoTune, comprised of 154 widely used math, physics, and computer science functions. For each function, the goal is to write code that produces the same outputs as the original function, while being faster. In addition to the benchmark, we also provide an agent, AlgoTuner, which allows language models to easily optimize code.

✨ New: AlgoTune can now be easily run on AWS with just an OpenRouter API key and AWS credentials. Try it out!

🚀 Quick Start

# 1. Install dependencies
pip install -e .  # or, if you prefer conda:
# conda create -n algotune python=3.10
# conda activate algotune
# pip install -e .

# 2. Add your API key
echo "OPENROUTER_API_KEY=your_key_here" > .env

Run AlgoTuner (stand-alone mode)

# Ask an LM to optimise the same tasks with model "o4-mini"
./algotune.sh --standalone agent o4-mini svm

# View the aggregated speed-up report
cat reports/agent_summary.json

Running on SLURM

When sbatch is available the launcher auto-detects SLURM.

# Run AlgoTuner on all tasks
./algotune.sh agent o4-mini

# Results are summarised in:
cat reports/agent_summary.json

Running on AWS Batch

Running AlgoTune on AWS is simple and requires only a minimal setup.

Prerequisites & Permissions

AWS CLI - Install if not already available:

pip install awscli

AWS IAM Policy - Create and attach this policy to your IAM user:

Go to IAM → Policies → Create policy
Click JSON tab and paste:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sts:GetCallerIdentity",
        "batch:*",
        "ecr:*",
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeVpcs",
        "ec2:DescribeRouteTables",
        "ec2:DescribeVpcEndpoints",
        "ec2:CreateVpcEndpoint",
        "ec2:ModifyVpcEndpoint",
        "ec2:DescribeInstances",
        "ec2:RunInstances",
        "ec2:TerminateInstances",
        "ec2:CreateTags",
        "s3:CreateBucket",
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "iam:PassRole",
        "iam:GetRole",
        "iam:CreateRole",
        "iam:AttachRolePolicy",
        "iam:CreateInstanceProfile",
        "iam:AddRoleToInstanceProfile",
        "iam:GetInstanceProfile",
        "iam:CreateServiceLinkedRole",
        "logs:GetLogEvents",
        "ecs:ListTasks"
      ],
      "Resource": "*"
    }
  ]
}

Click Next, name it (e.g., AlgoTuneBatchPolicy)
Click Create policy
Go to IAM → Users → [your user] → Permissions → Add permissions → Attach policies directly
Search for AlgoTuneBatchPolicy and attach it

If you have restricted permissions, manually create:

S3 bucket: algotune-results-{your-account-id} in your region
ECR repository: algotune in your region
VPC resources: Note your subnet ID and security group ID

Quick Start

# One-time setup
./aws/setup-aws.sh        # Interactive AWS configuration

# Launch jobs
./aws/launch-batch.sh     # Interactive: select model and tasks

Viewing Results

Extract the best code for each model/task:

python3 scripts/extract_results_from_logs.py

Or generate HTML logs in the style of AlgoTune.io:

./html/build-html.sh

Evaluating Code Without Running the Agent

You can add code for each task in directories (following the ./results/ structure) and it will be compiled and evaluated. Note that you have to generate the datasets first.

# Evaluate all models in ./results
./algotune.sh evaluate

# Evaluate specific models
./algotune.sh evaluate --models "Claude Opus 4" "o4-mini"

# View aggregated speedup results
cat reports/evaluate_summary.json

Generating Datasets for Offline Runs

AlgoTuner streams datasets from Hugging Face. For offline runs, generate them locally first:

# Example: generate datasets for two tasks with a 100 ms target
./algotune.sh --standalone generate --target-time-ms 100 --tasks svm

# Generate all datasets with a 250 ms target
./algotune.sh --standalone generate --target-time-ms 250

Citation

If you found this work helpful, please consider citing it using the following:

AlgoTune citation

@article{press2025algotune, title={AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?}, 
author={Press, Ori and Amos, Brandon and Zhao, Haoyu and Wu, Yikai and Ainsworth, Samuel K. and Krupke, Dominik and Kidger, Patrick and Sajed, Touqir and Stellato, Bartolomeo and Park, Jisun and Bosch, Nathanael and Meril, Eli and Steppi, Albert and Zharmagambetov, Arman and Zhang, Fangzhao and Perez-Pineiro, David and Mercurio, Alberto and Zhan, Ni and Abramovich, Talor and Lieret, Kilian and Zhang, Hanlin and Huang, Shirley and Bethge, Matthias and Press, Ofir}, 
journal={arXiv preprint arXiv:2507.15887},
year={2025},
 doi={10.48550/arXiv.2507.15887}, 
 url={https://arxiv.org/abs/2507.15887}}

Questions?

Feel free to write me at me@oripress.com

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
.github		.github
AlgoTuneTasks		AlgoTuneTasks
AlgoTuner		AlgoTuner
assets		assets
aws		aws
html		html
html_logs		html_logs
reports		reports
results		results
scripts		scripts
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
algotune.py		algotune.py
algotune.sh		algotune.sh
config.env		config.env
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

✨ New: AlgoTune can now be easily run on AWS with just an OpenRouter API key and AWS credentials. Try it out!

🚀 Quick Start

Run AlgoTuner (stand-alone mode)

Running on SLURM

Running on AWS Batch

Prerequisites & Permissions

Quick Start

Viewing Results

Evaluating Code Without Running the Agent

Generating Datasets for Offline Runs

Citation

Questions?

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

oripress/AlgoTune

Folders and files

Latest commit

History

Repository files navigation

✨ New: AlgoTune can now be easily run on AWS with just an OpenRouter API key and AWS credentials. Try it out!

🚀 Quick Start

Run AlgoTuner (stand-alone mode)

Running on SLURM

Running on AWS Batch

Prerequisites & Permissions

Quick Start

Viewing Results

Evaluating Code Without Running the Agent

Generating Datasets for Offline Runs

Citation

Questions?

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages