chore(examples): modernize image classification example (promptfoo#2197)

sparticleinc · Nov 22, 2024 · e1b44fb · e1b44fb
1 parent c9f65a8
commit e1b44fb
Show file tree

Hide file tree

Showing 4 changed files with 111 additions and 75 deletions.
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -28,6 +28,7 @@
     "jailbreaking",
     "leaderboard",
     "leetspeak",
+    "Lightbox",
     "Logform",
     "logprobs",
     "Mateo",

diff --git a/examples/image-classification/README.md b/examples/image-classification/README.md
@@ -1,36 +1,43 @@
 # Image Classification Example with Promptfoo
 
-This example demonstrates how to use Promptfoo for image classification tasks using the Fashion MNIST dataset. It showcases prompt engineering, configuration, and evaluation of AI models for image analysis. We use a prompt designed to output XML and compare class labels from the dataset with the model's output. Additional attributes in the XML illustrate how to extract more information using multi-modal models. This example is set up to use Anthropic, but you can easily switch to GPT-4 or other models by modifying the provider in the config file. You may need to adjust the prompt to match your model's output format and experiment with different prompts to see how they affect performance.
+This example demonstrates how to use Promptfoo for image classification tasks using the Fashion MNIST dataset. The example uses GPT-4o and GPT-4o-mini with a structured json schema to analyze images, including classification, color analysis, and additional attributes.
 
 ## Getting Started
 
-1. Generate the dataset:
+1. Set up your OpenAI API key:
 
    ```sh
-   python dataset_gen.py
+   export OPENAI_API_KEY='your-api-key'
    ```
 
-   Note: You may need to install dependencies with:
+2. Run the evaluation:
 
    ```sh
-   pip install -r requirements.txt
+   npx promptfoo@latest eval
    ```
 
-   This script creates a CSV file with 100 random images from the Fashion MNIST dataset and their labels. A CSV with 10 sample images is included so you can skip this step if preferred.
+3. View the results:
 
-2. Run the evaluation:
+   ```sh
+   npx promptfoo@latest view
+   ```
+
+4. Optionally, re-generate or update the dataset:
 
    ```sh
-   npx promptfoo@latest eval
+   python dataset_gen.py
    ```
 
-3. View the results:
+   Note: You may need to install dependencies with:
 
    ```sh
-   npx promptfoo@latest view
+   pip install -r requirements.txt
    ```
 
-4. Modify the prompt to see how it affects the model's performance. For example, try:
-   - adding `Begin with <analysis>` to the end of the prompt to make the `is-xml` assertion pass.
-   - changing the prompt to output JSON instead of XML.
-   - Modifying the prompt to include the classification within the xml `<classification>[T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot]</classification>`.
+   This script creates a CSV file with 100 random images from the Fashion MNIST dataset and their labels. A CSV with 10 sample images is included so you can skip this step if preferred.
+
+5. Experiment with the configuration:
+   - Modify the JSON schema in `promptfooconfig.yaml` to add or adjust required fields
+   - Try different models such as llama3.2 or Claude 3.5 by changing the provider in the config
+   - Adjust the system prompt to improve classification accuracy
+   - Add additional assertions to validate model outputs
diff --git a/examples/image-classification/prompt.js b/examples/image-classification/prompt.js
@@ -4,55 +4,35 @@ module.exports = (context) => {
   return [
     {
       role: 'system',
-      content: `
-    You are an AI assistant tasked with analyzing and classifying images. Your goal is to determine the type of clothing item depicted in the image and provide additional relevant information.
-
-    Please perform the following tasks:
-
-    1. Classify the image into one of the following categories:
-      - T-shirt/top
-      - Trouser
-      - Pullover
-      - Dress
-      - Coat
-      - Sandal
-      - Shirt
-      - Sneaker
-      - Bag
-      - Ankle boot
-
-    2. Provide the following additional information about the image:
-      a) The primary color or color scheme of the item
-      b) Any notable features or patterns on the item
-      c) The approximate style or era the item might belong to (e.g., modern, vintage, classic)
-
-    3. Estimate the confidence level of your classification on a scale of 1-10, where 1 is least confident and 10 is most confident.
-
-    Please provide your analysis in the following format:
-
-    <analysis>
-    <classification>[Insert the category here]</classification>
-    <color>[Describe the primary color or color scheme]</color>
-    <features>[Describe any notable features or patterns]</features>
-    <style>[Describe the approximate style or era]</style>
-    <confidence>[Insert your confidence level (1-10)]</confidence>
-    <reasoning>[Provide a brief explanation for your classification and confidence level]</reasoning>
-    </analysis>
-
-    Remember to base your analysis solely on the provided image. Do not make assumptions about information that is not explicitly stated or strongly implied by the description.
-
-    Begin with <analysis>
-    `,
+      content: dedent`
+        You are an AI assistant tasked with analyzing and classifying images. Your goal is to determine the type of clothing item depicted in the image and provide additional relevant information.
+
+        Please analyze the image and provide:
+        1. Classification (must be one of: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot)
+        2. Primary color or color scheme
+        3. Notable features or patterns
+        4. Approximate style or era (e.g., modern, vintage, classic)
+        5. Confidence level (1-10, where 1 is least confident and 10 is most confident)
+        6. Brief reasoning for the classification
+
+        Provide your response as a JSON object with the following structure:
+        {
+          "classification": string (one of the allowed categories),
+          "color": string,
+          "features": string,
+          "style": string,
+          "confidence": number (1-10),
+          "reasoning": string
+        }
+      `,
     },
     {
       role: 'user',
       content: [
         {
-          type: 'image',
-          source: {
-            type: 'base64',
-            media_type: 'image/jpeg',
-            data: context.vars.image_base64,
+          type: 'image_url',
+          image_url: {
+            url: `data:image/jpeg;base64,${context.vars.image_base64}`,
           },
         },
       ],

diff --git a/examples/image-classification/promptfooconfig.yaml b/examples/image-classification/promptfooconfig.yaml
@@ -1,24 +1,72 @@
 # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
 description: Image Classification Example of Fashion MNIST dataset
 providers:
-  - id: anthropic:messages:claude-3-5-sonnet-20241022
-    label: claude-3.5-sonnet
+  - openai:chat:gpt-4o
+  - openai:chat:gpt-4o-mini
 prompts:
-  - id: file://prompt.js
-    label: Image Classification
-tests: file://fashion_mnist_sample_base64.csv
+  - label: Image Classification
+    raw: file://prompt.js
+    config:
+      response_format:
+        type: json_schema
+        json_schema:
+          name: image_classification
+          schema:
+            type: object
+            properties:
+              classification:
+                type: string
+                enum:
+                  [
+                    'T-shirt/top',
+                    'Trouser',
+                    'Pullover',
+                    'Dress',
+                    'Coat',
+                    'Sandal',
+                    'Shirt',
+                    'Sneaker',
+                    'Bag',
+                    'Ankle boot',
+                  ]
+              color:
+                type: string
+              features:
+                type: string
+              style:
+                type: string
+              confidence:
+                type: integer
+              reasoning:
+                type: string
+            required:
+              - classification
+              - color
+              - features
+              - style
+              - confidence
+              - reasoning
+            additionalProperties: false
 defaultTest:
   assert:
-    - type: contains-xml
+    - type: is-json
       value:
-        requiredElements:
-          - analysis.classification
-          - analysis.color
-          - analysis.confidence
-          - analysis.features
-          - analysis.reasoning
-          - analysis.style
-    - type: is-xml
-      value: 'analysis.classification,analysis.color,analysis.features,analysis.style,analysis.confidence,analysis.reasoning'
-    - type: contains
-      value: '<classification>{{label}}</classification>'
+        type: object
+        properties:
+          classification:
+            type: string
+          color:
+            type: string
+          features:
+            type: string
+          style:
+            type: string
+          confidence:
+            type: integer
+          reasoning:
+            type: string
+    - type: javascript
+      value: 'output.classification === context.vars.label'
+      metric: accuracy
+
+tests: file://fashion_mnist_sample_base64.csv