Fix low recall when limit_val_batches is set#1298
Fix low recall when limit_val_batches is set#1298vickysharma-prog wants to merge 6 commits intoweecology:mainfrom
Conversation
There was a problem hiding this comment.
Thanks for your contribution, here's some comments:
- Please scope the PR to only the issue (remove .gitignore changes, we could include that in another submission).
- Please test with non-deprecated eval calls. Use m.create_trainer() with limit batches set as an argument and then call m.trainer.validate(m) or m.trainer.fit(m). You may need to set the validation interval to 1. This would better reflect a training scenario.
- As above, this code should work with .validate() -
__evaluate__is not called during training. - The code for
mainis a little defensive. The trainer is created oninitso it's almost impossible to run this function without self.trainer existing. - The test case does not adequately check the behavior. For example you have asserted non-negative recall, but this does not prove that recall accurately reflects the limited dataframe.
- Please remove the LLM-inserted "fixes" and issue number from the comment in main (L1023), this is unnecessary.
- Is this code correct in multi-GPU environments? I don't think my suggestion in the issue is correct in this case.
It's probably best for this logic to go in the RecallPrecision metric. You can filter ground_df by the image_paths that the metric was called on which is more reliable in multi-GPU.
|
Thanks for the detailed feedback @jveitchmichaelis! Really appreciate the thorough review.
Regarding the architectural suggestion: |
|
Pushed the updates:
|
|
Just pushed another commit - removed the defensive checks ( Current changes:
Regarding the RecallPrecision metric refactor - I searched the codebase and found (The ReadTheDocs failure seems to be a dependency issue unrelated to my changes) |
Please update your |
3d832f8 to
a4d9fe9
Compare
|
Rebased on latest main all checks passing now! |
|
I found the RecallPrecision logic in |
|
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
|
Moved the fix to |
Description
When
limit_val_batchesis set (e.g., 0.1 for 10%), evaluation loads the full ground truth CSV but predictions only cover the limited images. This makes recall look very low because of "missing" predictions for images that were never processed.Added a check in
__evaluate__that trimsground_dfbased onlimit_val_batchesvalue. Usesceil(limit_val_batches * n_images)as suggested in the issue.Also added a test case to verify the fix.
Related Issue(s)
Fixes #1232
AI-Assisted Development
AI tools used (if applicable):
Used for initial research and understanding the codebase structure