Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #705
This pull request adds a new CLIPScore metric for evaluating the alignment between images and their corresponding text descriptions. The implementation includes the metric logic, a Gradio-based web interface, and documentation. The most important changes are summarized below:
New Metric Implementation:
clip_score.py
implementing the CLIPScore metric using the CLIP model from HuggingFace Transformers, including logic for computing the similarity between image and text pairs.Web Interface:
app.py
that allows users to interactively compute CLIPScore by uploading an image and entering text, with example inputs and integration of the metric.Documentation:
README.md
describing the metric, usage instructions, input/output formats, example code, and citation information.Dependency Management:
requirements.txt
andsetup.py
to include the necessarytransformers
library for CLIPScore functionality.