Skip to content

Commit

Permalink
chore(examples): modernize image classification example (promptfoo#2197)
Browse files Browse the repository at this point in the history
  • Loading branch information
mldangelo authored Nov 22, 2024
1 parent c9f65a8 commit e1b44fb
Show file tree
Hide file tree
Showing 4 changed files with 111 additions and 75 deletions.
1 change: 1 addition & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
"jailbreaking",
"leaderboard",
"leetspeak",
"Lightbox",
"Logform",
"logprobs",
"Mateo",
Expand Down
35 changes: 21 additions & 14 deletions examples/image-classification/README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,43 @@
# Image Classification Example with Promptfoo

This example demonstrates how to use Promptfoo for image classification tasks using the Fashion MNIST dataset. It showcases prompt engineering, configuration, and evaluation of AI models for image analysis. We use a prompt designed to output XML and compare class labels from the dataset with the model's output. Additional attributes in the XML illustrate how to extract more information using multi-modal models. This example is set up to use Anthropic, but you can easily switch to GPT-4 or other models by modifying the provider in the config file. You may need to adjust the prompt to match your model's output format and experiment with different prompts to see how they affect performance.
This example demonstrates how to use Promptfoo for image classification tasks using the Fashion MNIST dataset. The example uses GPT-4o and GPT-4o-mini with a structured json schema to analyze images, including classification, color analysis, and additional attributes.

## Getting Started

1. Generate the dataset:
1. Set up your OpenAI API key:

```sh
python dataset_gen.py
export OPENAI_API_KEY='your-api-key'
```

Note: You may need to install dependencies with:
2. Run the evaluation:

```sh
pip install -r requirements.txt
npx promptfoo@latest eval
```

This script creates a CSV file with 100 random images from the Fashion MNIST dataset and their labels. A CSV with 10 sample images is included so you can skip this step if preferred.
3. View the results:

2. Run the evaluation:
```sh
npx promptfoo@latest view
```

4. Optionally, re-generate or update the dataset:

```sh
npx promptfoo@latest eval
python dataset_gen.py
```

3. View the results:
Note: You may need to install dependencies with:

```sh
npx promptfoo@latest view
pip install -r requirements.txt
```

4. Modify the prompt to see how it affects the model's performance. For example, try:
- adding `Begin with <analysis>` to the end of the prompt to make the `is-xml` assertion pass.
- changing the prompt to output JSON instead of XML.
- Modifying the prompt to include the classification within the xml `<classification>[T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot]</classification>`.
This script creates a CSV file with 100 random images from the Fashion MNIST dataset and their labels. A CSV with 10 sample images is included so you can skip this step if preferred.

5. Experiment with the configuration:
- Modify the JSON schema in `promptfooconfig.yaml` to add or adjust required fields
- Try different models such as llama3.2 or Claude 3.5 by changing the provider in the config
- Adjust the system prompt to improve classification accuracy
- Add additional assertions to validate model outputs
68 changes: 24 additions & 44 deletions examples/image-classification/prompt.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,55 +4,35 @@ module.exports = (context) => {
return [
{
role: 'system',
content: `
You are an AI assistant tasked with analyzing and classifying images. Your goal is to determine the type of clothing item depicted in the image and provide additional relevant information.
Please perform the following tasks:
1. Classify the image into one of the following categories:
- T-shirt/top
- Trouser
- Pullover
- Dress
- Coat
- Sandal
- Shirt
- Sneaker
- Bag
- Ankle boot
2. Provide the following additional information about the image:
a) The primary color or color scheme of the item
b) Any notable features or patterns on the item
c) The approximate style or era the item might belong to (e.g., modern, vintage, classic)
3. Estimate the confidence level of your classification on a scale of 1-10, where 1 is least confident and 10 is most confident.
Please provide your analysis in the following format:
<analysis>
<classification>[Insert the category here]</classification>
<color>[Describe the primary color or color scheme]</color>
<features>[Describe any notable features or patterns]</features>
<style>[Describe the approximate style or era]</style>
<confidence>[Insert your confidence level (1-10)]</confidence>
<reasoning>[Provide a brief explanation for your classification and confidence level]</reasoning>
</analysis>
Remember to base your analysis solely on the provided image. Do not make assumptions about information that is not explicitly stated or strongly implied by the description.
Begin with <analysis>
`,
content: dedent`
You are an AI assistant tasked with analyzing and classifying images. Your goal is to determine the type of clothing item depicted in the image and provide additional relevant information.
Please analyze the image and provide:
1. Classification (must be one of: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot)
2. Primary color or color scheme
3. Notable features or patterns
4. Approximate style or era (e.g., modern, vintage, classic)
5. Confidence level (1-10, where 1 is least confident and 10 is most confident)
6. Brief reasoning for the classification
Provide your response as a JSON object with the following structure:
{
"classification": string (one of the allowed categories),
"color": string,
"features": string,
"style": string,
"confidence": number (1-10),
"reasoning": string
}
`,
},
{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/jpeg',
data: context.vars.image_base64,
type: 'image_url',
image_url: {
url: `data:image/jpeg;base64,${context.vars.image_base64}`,
},
},
],
Expand Down
82 changes: 65 additions & 17 deletions examples/image-classification/promptfooconfig.yaml
Original file line number Diff line number Diff line change
@@ -1,24 +1,72 @@
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: Image Classification Example of Fashion MNIST dataset
providers:
- id: anthropic:messages:claude-3-5-sonnet-20241022
label: claude-3.5-sonnet
- openai:chat:gpt-4o
- openai:chat:gpt-4o-mini
prompts:
- id: file://prompt.js
label: Image Classification
tests: file://fashion_mnist_sample_base64.csv
- label: Image Classification
raw: file://prompt.js
config:
response_format:
type: json_schema
json_schema:
name: image_classification
schema:
type: object
properties:
classification:
type: string
enum:
[
'T-shirt/top',
'Trouser',
'Pullover',
'Dress',
'Coat',
'Sandal',
'Shirt',
'Sneaker',
'Bag',
'Ankle boot',
]
color:
type: string
features:
type: string
style:
type: string
confidence:
type: integer
reasoning:
type: string
required:
- classification
- color
- features
- style
- confidence
- reasoning
additionalProperties: false
defaultTest:
assert:
- type: contains-xml
- type: is-json
value:
requiredElements:
- analysis.classification
- analysis.color
- analysis.confidence
- analysis.features
- analysis.reasoning
- analysis.style
- type: is-xml
value: 'analysis.classification,analysis.color,analysis.features,analysis.style,analysis.confidence,analysis.reasoning'
- type: contains
value: '<classification>{{label}}</classification>'
type: object
properties:
classification:
type: string
color:
type: string
features:
type: string
style:
type: string
confidence:
type: integer
reasoning:
type: string
- type: javascript
value: 'output.classification === context.vars.label'
metric: accuracy

tests: file://fashion_mnist_sample_base64.csv

0 comments on commit e1b44fb

Please sign in to comment.