-
Notifications
You must be signed in to change notification settings - Fork 75
Added score for duplicate images #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #183 +/- ##
==========================================
- Coverage 94.26% 94.18% -0.08%
==========================================
Files 16 16
Lines 889 895 +6
Branches 164 164
==========================================
+ Hits 838 843 +5
- Misses 30 32 +2
+ Partials 21 20 -1
|
lambda x: True if x in duplicated_images else False | ||
) | ||
score = 1.0 / len(s) | ||
score_df.loc[s, score_col] = score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you cover this line in unit test, if easy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covered in unit test
@@ -171,7 +171,7 @@ def test_hf_dataset_run(generate_local_dataset, n_classes, images_per_class): | |||
imagelab = Imagelab(hf_dataset=hf_dataset, image_key="image") | |||
imagelab.find_issues() | |||
imagelab.report() | |||
assert len(imagelab.issues.columns) == 14 | |||
assert len(imagelab.issues.columns) == 16 | |||
assert len(imagelab.issues) == n_classes * images_per_class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the most duplicated image in our testing dataset? couldn't we easily add a test to verify this image has the lowest score (tied w other images in its duplicate-set) ? The current tests don't seem to test any of the logic at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covered in unit test
Co-authored-by: Jonas Mueller <1390638+jwmueller@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Cool that you're sorting the duplicate issues by the duplicate counts when visualizing.
Added score for exact_duplicates and near_duplicates issue type.
score = 1 / num_images_in_duplicated_set