Skip to content

Conversation

@lciernik
Copy link
Collaborator

This PR fixes the following bugs in the feature extraction pipeline extract_features

  • BaseExtractor.extract_features:
    • When storing intermediate results previously, we reset features=defaultdict(list). This, unfortunately, deletes the intermediate results of other modules. Therefore changed to features[module_name]=[]
    • Paths need to be created before trying to store data: if module/feature does not exist, then an error is thrown.
    • We store intermediate file names. When save_in_one_file=True, output_dir is prepended again. Which incorrect.
  • PyTorchExtractor calls super().extract_features --> need to add file_name_suffix and save_in_one_file as arguments to be able to pass it to the parent.

NEW: Dataloader modifier

PR adds a context manager for PyTorch feature extraction. Assumes that a dataloader either returns an image or a tuple of (image, *args) or a list of (image, *args). If the dataloader returns a tuple of (image, *args) or a list of (image, *args), it will return only the images. The class does not modify otherwise (e.g., specialized dataloaders).

@lciernik lciernik self-assigned this Aug 11, 2025
@lciernik lciernik added bug Something isn't working enhancement New feature or request labels Aug 11, 2025
@lciernik lciernik requested a review from lukasthede August 11, 2025 09:21
Copy link
Collaborator

@lucaeyring lucaeyring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for the fixes!!

Copy link
Collaborator

@MarcoMorik MarcoMorik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@codecov
Copy link

codecov bot commented Aug 11, 2025

Codecov Report

❌ Patch coverage is 54.00000% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.36%. Comparing base (a1a1092) to head (bae13a4).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
thingsvision/core/extraction/base.py 8.00% 23 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #191   +/-   ##
=======================================
  Coverage   75.36%   75.36%           
=======================================
  Files          40       40           
  Lines        2139     2172   +33     
  Branches      272      276    +4     
=======================================
+ Hits         1612     1637   +25     
- Misses        429      437    +8     
  Partials       98       98           
Flag Coverage Δ
unittests 75.36% <54.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@MarcoMorik MarcoMorik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Tests

Copy link
Collaborator

@LukasMut LukasMut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LukasMut LukasMut added this pull request to the merge queue Aug 11, 2025
Merged via the queue into master with commit ac156c1 Aug 11, 2025
5 of 7 checks passed
@lciernik lciernik deleted the bug/fix-feature-extraction-bugs branch November 12, 2025 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants