Some examples of using Inspect AI to evaluate task performance:
├── data/ # Evaluate datasets
├── logs/ # Evaluation results and logs
├── classifier.py # Binary classification evaluation
├── intent_classifier.py # Intent classification evaluation
├── .env # API configuration
└── pyproject.toml # Project dependencies
- Tested on Python 3.13
- uv package manager
- Install dependencies:
uv sync
- Create a
.envfile in the project root - Add your API configuration:
export BEDROCK_API_KEY=<bedrock_api_key> export BEDROCK_BASE_URL=https://bedrock-runtime.<aws_region>.amazonaws.com/openai/v1
-
Source environment variables:
source .env -
Run an evaluation:
python <evaluation_file.py>
-
View results:
inspect view