The dataset under the Visual SWE-bench directory:
data.json: Visual SWE-bench dataset following the SWE-bench structure.list_data.json: Dataset where theproblem_statementfield is a list of strings, each representing a hyperlink to an image, video, or issue text.list_data_onlyimage.json: A subset oflist_data.json, where theproblem_statementcontains only visual data consisting of image hyperlinks.list_data_onlyvideo.json: A subset oflist_data.json, where theproblem_statementcontains only visual data consisting of video hyperlinks.
To build CodeV from source, follow these steps:
git clone https://github.com/luolin101/CodeV.git
cd CodeV
pip install -e .To configure the visual language model (VLM) used in this project, update the .env file with the following settings:
base_url: The base URL for the VLM API.api_key: Your API key for authentication.model: The specific model you intend to use.
1.Process Image
python dataProcess_image.py --dataset "Visual SWE-bench/list_data_onlyimage.json" \
--out_folder output_folder_iamgeThis will generate the processed image information under the output_folder_image directory.
2.Process Video
python dataProcess_video.py --dataset "Visual SWE-bench/list_data_onlyvideo.json" \
--out_folder output_folder_videoThis will generate the processed video information under the output_folder_video directory.
Note: Given that the VLM may not strictly follow the json format required by prompt when generating responses, and manual adjustment is needed.
1.Add Image Information
python addImage.py --dataset "Visual SWE-bench/list_data_onlyimage.json" \
--in_folder output_folder_iamgeThis will generate data_with_image.json with the image information appended to the problem_statement field.
2.Add Video Information
python addVideo.py --dataset "Visual SWE-bench/list_data_onlyvideo.json" \
--in_folder output_folder_videoThis will generate data_with_video.json with the video information appended to the problem_statement field.
3.Merge Data
python mergeData.py --file1 data_with_image.json \
--file2 data_with_video.jsonFrom here, all visual issues have been converted to textual form and saved in processed_data.json.
4.Use Textual Issue Resolving Approaches
After obtaining processed_data.json, we can use approaches like SWE-agent, Agentless, and other textual issue resolving approaches to resolve the issues within the data.
Use visualswebench.harness.run_evaluation to evaluate your predictions on Visual SWE-bench:
python -m visualswebench.harness.run_evaluation \
--dataset_name "Visual SWE-bench/data.json" \
--predictions_path <path_to_predictions> \
--max_workers <num_workers> \
--run_id <run_id>
# use --predictions_path 'gold' to verify the gold patches
# use --run_id to name the evaluation runYou can also evaluate on specific issue instances:
python -m visualswebench.harness.run_evaluation \
--dataset_name "Visual SWE-bench/data.json" \
--predictions_path <path_to_predictions> \
--max_workers <num_workers> \
--run_id <run_id> \
--instance_ids <instance_id>The outputs include:
- docker build logs under the
build_image_logsdirectory - evaluation logs under the
run_instance_logsdirectory - a result summary in the
<prediction_file_name>.<run_id>.jsonfile