Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 46 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Convert eval log from Inspect AI into json format with following command:
uv run inspect log convert path_to_eval_file_generated_by_inspect --to json --output-dir inspect_json
```

Then we can convert Inspect evaluation log into unified schema via eval_converters/inspect/converter.py. Conversion for example data can be generated via below script:
Then we can convert Inspect evaluation log into unified schema via `eval_converters/inspect/converter.py`. Conversion for example data can be generated via below script:

```bash
uv run python3 -m eval_converters.inspect.converter
Expand Down Expand Up @@ -63,11 +63,54 @@ options:
--source_organization_logo_url SOURCE_ORGANIZATION_LOGO_URL
```

## Tests

### HELM

Convert eval log from HELM into json format with following command:

```bash
uv run inspect log convert path_to_eval_file_generated_by_inspect --to json --output-dir inspect_json
```

You can convert HELM evaluation log into unified schema via `eval_converters/helm/converter.py`. For example:

```bash
uv run python3 -m eval_converters.inspect.converter --log_path tests/data/helm
```

The automatic conversion script requires following files generated by HELM to work correctly:
- per_instance_stats.json
- run_spec.json
- scenario_state.json
- scenario.json
- stats.json

Full manual for conversion of your own HELM evaluation log into unified is available below:

```bash
usage: converter.py [-h] [--log_dirpath LOG_DIRPATH] [--huggingface_dataset HUGGINGFACE_DATASET] [--output_dir OUTPUT_DIR] [--source_organization_name SOURCE_ORGANIZATION_NAME]
[--evaluator_relationship {first_party,third_party,collaborative,other}] [--source_organization_url SOURCE_ORGANIZATION_URL]
[--source_organization_logo_url SOURCE_ORGANIZATION_LOGO_URL]

options:
-h, --help show this help message and exit
--log_dirpath LOG_DIRPATH
Path to directory with single evaluaion or multiple evaluations to convert
--huggingface_dataset HUGGINGFACE_DATASET
--output_dir OUTPUT_DIR
--source_organization_name SOURCE_ORGANIZATION_NAME
Orgnization which pushed evaluation to the evalHub.
--evaluator_relationship {first_party,third_party,collaborative,other}
Relationship of evaluation author to the model
--source_organization_url SOURCE_ORGANIZATION_URL
--source_organization_logo_url SOURCE_ORGANIZATION_LOGO_URL
```

### Tests

Run below script to perform unit tests for all evaluation platforms.

```bash
uv run pytest -s
uv run pytest -s --disable-warnings
uv run ruff check
```
Loading