Name		Name	Last commit message	Last commit date
parent directory ..
known_code_blobs		known_code_blobs
EVAL_NEW_CODE_RESULTS.md		EVAL_NEW_CODE_RESULTS.md
IMPROVE_CODE_RESULTS.md		IMPROVE_CODE_RESULTS.md
README.md		README.md
__init__.py		__init__.py
eval_tools.py		eval_tools.py
evals_existing_code.py		evals_existing_code.py
evals_new_code.py		evals_new_code.py
existing_code_eval.yaml		existing_code_eval.yaml
new_code_eval.yaml		new_code_eval.yaml

README.md

Evals

Evals are a set of tests that allow us to measure the performance of the gpt-engineer whole system. This includes the gpt-enginer code, options and the chosen LLM.

Running Evals

To run the existing code evals make sure you are in the gpt-engineer top level directory (you should see a directory called evals) type:

python evals/evals_existing_code.py This will run the default test file: evals/existing_code_eval.yaml, or you can run any YAML file of tests you wish with the command: python evals/evals_existing_code.py your_test_file.yaml

Similarly to run the new code evals type:

python evals/evals_new_code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evals

evals

README.md

Evals

Running Evals

Files

evals

Directory actions

More options

Directory actions

More options

Latest commit

History

evals

Folders and files

parent directory

README.md

Evals

Running Evals