Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Model Performance Evaluation (evaluate.py)

This script is used to evaluate the model's performance on a test set. It can operate in two modes:

  • pair: Calculates pairwise accuracy.
  • ranking: Calculates ranking accuracy.

Pair-wise Sample

We set path1's image is better than path2's image for simplicity.

[
    {
        "prompt": ".....",
        "path1": ".....",
        "path2": "....."
    },
    {
        "prompt": ".....",
        "path1": ".....",
        "path2": "....."
    },
  ...
]

Rank-wise Sample

[
    {
        "id": "005658-0040",
        "prompt": ".....",
        "generations": [
            "path to image1",
            "path to image2",
            "path to image3",
            "path to image4"
        ],
        "ranking": [
            1,
            2,
            5,
            3
        ]
    },
  ...
]

Usage

python evaluate/evaluate.py \
  --test_json /path/to/your/test_data.json \
  --config_path config/HPSv3_7B.yaml \
  --checkpoint_path checkpoints/HPSv3_7B/model.pth \
  --mode pair \
  --batch_size 8 \
  --num_processes 8

Arguments:

  • --test_json: (Required) Path to the JSON file containing evaluation data.
  • --config_path: (Required) Path to the model's configuration file.
  • --checkpoint_path: (Required) Path to the model checkpoint.
  • --mode: The evaluation mode. Can be pair or ranking. (Default: pair)
  • --batch_size: Batch size for inference. (Default: 8)
  • --num_processes: Number of parallel processes to use. (Default: 8)

Reward Benchmarking (benchmark.py)

This script is used to run inference with a reward model over one or more folders of images. It calculates a reward score for each image based on its corresponding text prompt (expected in a .txt file with the same name). The script then outputs statistics (mean, std, min, max) for each folder and saves the detailed results to a JSON file.

It supports multiple reward models through the --model_type argument.

Usage

The script is run using argparse. Below is a command-line example:

python evaluate/benchmark.py \
  --config_path config/HPSv3_7B.yaml \
  --checkpoint_path checkpoints/HPSv3_7B/model.pth \
  --model_type hpsv3 \
  --image_folders /path/to/images/folder1 /path/to/images/folder2 \
  --output_path ./benchmark_results.json \
  --batch_size 16 \
  --num_processes 8

Arguments:

  • --config_path: (Required) Path to the model's configuration file.
  • --checkpoint_path: (Required) Path to the model checkpoint.
  • --model_type: The reward model to use. Choices: hpsv3, hpsv2, imagereward. (Default: hpsv3)
  • --image_folders: (Required) One or more paths to folders containing the images to benchmark.
  • --output_path: (Required) Path to save the output JSON file with results.
  • --batch_size: Batch size for processing. (Default: 16)
  • --num_processes: Number of parallel processes to use. (Default: 8)
  • --num_machines: For distributed inference, the total number of machines. (Default: 1)
  • --machine_id: For distributed inference, the ID of the current machine. (Default: 0)