forked from promptfoo/promptfoo
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore(examples): modernize image classification example (promptfoo#2197)
- Loading branch information
Showing
4 changed files
with
111 additions
and
75 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,7 @@ | |
"jailbreaking", | ||
"leaderboard", | ||
"leetspeak", | ||
"Lightbox", | ||
"Logform", | ||
"logprobs", | ||
"Mateo", | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,36 +1,43 @@ | ||
# Image Classification Example with Promptfoo | ||
|
||
This example demonstrates how to use Promptfoo for image classification tasks using the Fashion MNIST dataset. It showcases prompt engineering, configuration, and evaluation of AI models for image analysis. We use a prompt designed to output XML and compare class labels from the dataset with the model's output. Additional attributes in the XML illustrate how to extract more information using multi-modal models. This example is set up to use Anthropic, but you can easily switch to GPT-4 or other models by modifying the provider in the config file. You may need to adjust the prompt to match your model's output format and experiment with different prompts to see how they affect performance. | ||
This example demonstrates how to use Promptfoo for image classification tasks using the Fashion MNIST dataset. The example uses GPT-4o and GPT-4o-mini with a structured json schema to analyze images, including classification, color analysis, and additional attributes. | ||
|
||
## Getting Started | ||
|
||
1. Generate the dataset: | ||
1. Set up your OpenAI API key: | ||
|
||
```sh | ||
python dataset_gen.py | ||
export OPENAI_API_KEY='your-api-key' | ||
``` | ||
|
||
Note: You may need to install dependencies with: | ||
2. Run the evaluation: | ||
|
||
```sh | ||
pip install -r requirements.txt | ||
npx promptfoo@latest eval | ||
``` | ||
|
||
This script creates a CSV file with 100 random images from the Fashion MNIST dataset and their labels. A CSV with 10 sample images is included so you can skip this step if preferred. | ||
3. View the results: | ||
|
||
2. Run the evaluation: | ||
```sh | ||
npx promptfoo@latest view | ||
``` | ||
|
||
4. Optionally, re-generate or update the dataset: | ||
|
||
```sh | ||
npx promptfoo@latest eval | ||
python dataset_gen.py | ||
``` | ||
|
||
3. View the results: | ||
Note: You may need to install dependencies with: | ||
|
||
```sh | ||
npx promptfoo@latest view | ||
pip install -r requirements.txt | ||
``` | ||
|
||
4. Modify the prompt to see how it affects the model's performance. For example, try: | ||
- adding `Begin with <analysis>` to the end of the prompt to make the `is-xml` assertion pass. | ||
- changing the prompt to output JSON instead of XML. | ||
- Modifying the prompt to include the classification within the xml `<classification>[T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot]</classification>`. | ||
This script creates a CSV file with 100 random images from the Fashion MNIST dataset and their labels. A CSV with 10 sample images is included so you can skip this step if preferred. | ||
|
||
5. Experiment with the configuration: | ||
- Modify the JSON schema in `promptfooconfig.yaml` to add or adjust required fields | ||
- Try different models such as llama3.2 or Claude 3.5 by changing the provider in the config | ||
- Adjust the system prompt to improve classification accuracy | ||
- Add additional assertions to validate model outputs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,72 @@ | ||
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json | ||
description: Image Classification Example of Fashion MNIST dataset | ||
providers: | ||
- id: anthropic:messages:claude-3-5-sonnet-20241022 | ||
label: claude-3.5-sonnet | ||
- openai:chat:gpt-4o | ||
- openai:chat:gpt-4o-mini | ||
prompts: | ||
- id: file://prompt.js | ||
label: Image Classification | ||
tests: file://fashion_mnist_sample_base64.csv | ||
- label: Image Classification | ||
raw: file://prompt.js | ||
config: | ||
response_format: | ||
type: json_schema | ||
json_schema: | ||
name: image_classification | ||
schema: | ||
type: object | ||
properties: | ||
classification: | ||
type: string | ||
enum: | ||
[ | ||
'T-shirt/top', | ||
'Trouser', | ||
'Pullover', | ||
'Dress', | ||
'Coat', | ||
'Sandal', | ||
'Shirt', | ||
'Sneaker', | ||
'Bag', | ||
'Ankle boot', | ||
] | ||
color: | ||
type: string | ||
features: | ||
type: string | ||
style: | ||
type: string | ||
confidence: | ||
type: integer | ||
reasoning: | ||
type: string | ||
required: | ||
- classification | ||
- color | ||
- features | ||
- style | ||
- confidence | ||
- reasoning | ||
additionalProperties: false | ||
defaultTest: | ||
assert: | ||
- type: contains-xml | ||
- type: is-json | ||
value: | ||
requiredElements: | ||
- analysis.classification | ||
- analysis.color | ||
- analysis.confidence | ||
- analysis.features | ||
- analysis.reasoning | ||
- analysis.style | ||
- type: is-xml | ||
value: 'analysis.classification,analysis.color,analysis.features,analysis.style,analysis.confidence,analysis.reasoning' | ||
- type: contains | ||
value: '<classification>{{label}}</classification>' | ||
type: object | ||
properties: | ||
classification: | ||
type: string | ||
color: | ||
type: string | ||
features: | ||
type: string | ||
style: | ||
type: string | ||
confidence: | ||
type: integer | ||
reasoning: | ||
type: string | ||
- type: javascript | ||
value: 'output.classification === context.vars.label' | ||
metric: accuracy | ||
|
||
tests: file://fashion_mnist_sample_base64.csv |