Raise an exception on receiving duplicate filepaths#242
Raise an exception on receiving duplicate filepaths#242sanjanag merged 3 commits intocleanlab:mainfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #242 +/- ##
==========================================
- Coverage 96.04% 95.84% -0.20%
==========================================
Files 16 16
Lines 986 987 +1
Branches 194 194
==========================================
- Hits 947 946 -1
- Misses 20 21 +1
- Partials 19 20 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
LGTM, made a small suggestion you can consider, but feel free to merge.
Does this fix warrant releasing a new version of CleanVision? I'm not sure how you encountered the bug you faced that inspired this simple fix (why did you have duplicate filepaths in the first place before?)
|
Fixes #236 |
Co-authored-by: Jonas Mueller <1390638+jwmueller@users.noreply.github.com>
Co-authored-by: Jonas Mueller <1390638+jwmueller@users.noreply.github.com>
I would hold off on a new release, as this is an odd occurrence. Also this #222 is another bug I want to fix before doing it . I was working with Amazon Berkeley Objects dataset, and it has a separate file for product metadata and images separately. I wrote some code to map the two of them, and the duplicates filepaths got introduced in that (not because of some error in code, but how the metadata file is structured). |
Duplicate filepaths caused out of memory error when doing joins and merges in pandas dataframes.