GitHub - openinterpreter/benchmarks-v0 at c92acf20f3507923223b5a7b68d8685f9daf8da7

Name	Name	Last commit message	Last commit date
Latest commit History 24 Commits
worker	worker
.gitignore	.gitignore
Dockerfile	Dockerfile
README.md	README.md
benchmark.py	benchmark.py
commands.py	commands.py
gaia.py	gaia.py
requirements.txt	requirements.txt
run_benchmarks.py	run_benchmarks.py
test.py	test.py

Name

Last commit message

Last commit date

This repo is used to run various AI benchmarks on the Open Interpreter project. Only GAIA is currently supported (although image tasks are broken I'll fix it soon promise).

Setup

Make sure the following software is installed on your computer.

Copy-paste the following lines into your terminal if you're feeling dangerous.

git clone https://github.com/imapersonman/oi-benchmarks.git \
  && cd oi-benchmarks \
  && python -m venv .venv \
  && source .venv/bin/activate \
  && python -m pip install -r requirements.txt \
  && docker build -t worker .

Running Benchmarks

This section assumes that oi-benchmarks (downloaded via git in the preview section) is set as the current working directory, and that you've activated the virtualenv with the installed prerequisite packages.

Example: gpt-3.5-turbo, first 16 GAIA tasks, 8 docker containers

This command will output a file called output.csv containing the results of the benchmark.

python run_benchmarks.py \
  --command gpt35turbo \
  --ntasks 16 \
  --nworkers 8

--command gpt35turbo: Replace gpt35turbo with any existing key in the commands Dict in commands.py. Defaults to gpt35turbo.
--ntasks 16: Grabs the first 16 GAIA tasks to run. Defaults to all 165 GAIA validation tasks.
--nworkers 8: Number of docker containers to run at once. Defaults to whatever max_workers defaults to when constructing a ThreadPoolExecutor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Setup

Running Benchmarks

Example: gpt-3.5-turbo, first 16 GAIA tasks, 8 docker containers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

openinterpreter/benchmarks-v0

Folders and files

Latest commit

History

Repository files navigation

Setup

Running Benchmarks

Example: gpt-3.5-turbo, first 16 GAIA tasks, 8 docker containers

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages