Simple Bench

Run Instructions

Run benchmark:

python run_benchmark.py --model_name=gpt-4o --dataset_path=simple_bench_public.json

Clone the github repo and cd into it.

Make sure you have the correct python version (3.10.11) as a venv:

pyenv local 3.10.11
python -m venv llm_env
source llm_env/bin/activate

Install dependencies:

pip install -r requirements.txt

Create a .env file with the following:

OPENAI_API_KEY=<your key>
ANTHROPIC_API_KEY=<your key>
...

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
weave_utils		weave_utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_benchmark.py		run_benchmark.py
simple_bench_public.json		simple_bench_public.json
simple_bench_public_set.csv		simple_bench_public_set.csv