Fix low recall when limit_val_batches is set by vickysharma-prog · Pull Request #1298 · weecology/DeepForest

vickysharma-prog · 2026-02-03T11:41:01Z

Description

When limit_val_batches is set (e.g., 0.1 for 10%), evaluation loads the full ground truth CSV but predictions only cover the limited images. This makes recall look very low because of "missing" predictions for images that were never processed.

Added a check in __evaluate__ that trims ground_df based on limit_val_batches value. Uses ceil(limit_val_batches * n_images) as suggested in the issue.

Also added a test case to verify the fix.

Related Issue(s)

Fixes #1232

AI-Assisted Development

I used AI tools (e.g., GitHub Copilot, ChatGPT, etc.) in developing this PR
I understand all the code I'm submitting
I have reviewed and validated all AI-generated code

AI tools used (if applicable):
Used for initial research and understanding the codebase structure

jveitchmichaelis

Thanks for your contribution, here's some comments:

Please scope the PR to only the issue (remove .gitignore changes, we could include that in another submission).
Please test with non-deprecated eval calls. Use m.create_trainer() with limit batches set as an argument and then call m.trainer.validate(m) or m.trainer.fit(m). You may need to set the validation interval to 1. This would better reflect a training scenario.
As above, this code should work with .validate() - __evaluate__ is not called during training.
The code for main is a little defensive. The trainer is created on init so it's almost impossible to run this function without self.trainer existing.
The test case does not adequately check the behavior. For example you have asserted non-negative recall, but this does not prove that recall accurately reflects the limited dataframe.
Please remove the LLM-inserted "fixes" and issue number from the comment in main (L1023), this is unnecessary.
Is this code correct in multi-GPU environments? I don't think my suggestion in the issue is correct in this case.

It's probably best for this logic to go in the RecallPrecision metric. You can filter ground_df by the image_paths that the metric was called on which is more reliable in multi-GPU.

vickysharma-prog · 2026-02-05T06:12:06Z

Thanks for the detailed feedback @jveitchmichaelis! Really appreciate the thorough review.
Quick acknowledgments:

✅ Will remove .gitignore changes
✅ Will remove the "fixes Evaluation reports spuriously low recall if limit_batches is set #1232" comment from code
✅ Will make the code less defensive

Regarding the architectural suggestion:
It's probably best for this logic to go in the RecallPrecision metric. You can filter ground_df by the image_paths that the metric was called on which is more reliable in multi-GPU.
This makes sense - filtering at the metric level based on actual predicted image_paths would be more reliable than calculating based on limit_val_batches. Could you point me to where the RecallPrecision metric is defined so I can refactor the fix there?
For the test - I'll update it to use create_trainer() with limit_val_batches and call trainer.validate() instead of the deprecated __evaluate__ method.
Let me know if I'm understanding correctly!

vickysharma-prog · 2026-02-05T07:11:56Z

Pushed the updates:

Removed .gitignore changes
Updated comment in main.py
Updated test to use create_trainer() + trainer.validate()
Note: ReadTheDocs build seems to be failing on dependency install (uv sync --extra docs) - this appears unrelated to my changes. Let me know if I need to do anything on my end.
Still working on understanding the RecallPrecision metric location for the architectural refactor you suggested.

vickysharma-prog · 2026-02-05T07:37:34Z

Just pushed another commit - removed the defensive checks (hasattr and getattr) since trainer always exists.

Current changes:

✅ Removed .gitignore changes
✅ Removed issue reference from code comment
✅ Made code less defensive
✅ Updated test to use create_trainer() + trainer.validate()

Regarding the RecallPrecision metric refactor - I searched the codebase and found iou_metric and mAP_metric (from torchmetrics), but couldn't find a custom RecallPrecision metric. Still working on understanding the RecallPrecision metric location for the architectural refactor you suggested.

(The ReadTheDocs failure seems to be a dependency issue unrelated to my changes)

jveitchmichaelis · 2026-02-05T08:39:41Z

Still working on understanding the RecallPrecision metric location for the architectural refactor you suggested.

Please update your main branch + rebase this one

vickysharma-prog · 2026-02-05T14:21:20Z

Rebased on latest main all checks passing now!
Let me know if there's anything else to address.

vickysharma-prog · 2026-02-05T14:35:26Z

I found the RecallPrecision logic in metrics.py.
From what I can see, the filtering could live in the metric itself by restricting ground_df to the image_paths actually seen by the metric before calling __evaluate_wrapper__. I was thinking this would happen in compute(), but I wanted to double-check that this aligns with the intended flow (vs doing this earlier in update()).
Does this sound like the right place to apply the fix?

jveitchmichaelis · 2026-02-05T21:46:13Z

Please review your test case. Hint: can you demonstrate this would fail before your fix?
You cannot handle this in update() as the underlying issue happens when __evaluate_wrapper__ is called on the full ground truth dataframe.

vickysharma-prog · 2026-02-06T06:31:52Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

vickysharma-prog · 2026-02-06T06:42:39Z

Moved the fix to RecallPrecision.compute() in metrics.py as suggested - now filtering ground_df to only include images that were actually predicted before calling __evaluate_wrapper__.
Removed the old fix from main.py and updated the test.

jveitchmichaelis requested changes Feb 5, 2026

View reviewed changes

vickysharma-prog requested a review from jveitchmichaelis February 5, 2026 08:12

vickysharma-prog added 4 commits February 5, 2026 19:37

Ignore virtual environment directory

f6d2e0c

Fix spurious low recall when limit_val_batches is set (weecology#1232)

9952590

Address review feedback: update test to use trainer.validate()

093678f

Remove defensive checks as trainer always exists

a4d9fe9

vickysharma-prog force-pushed the fix-limit-batches-recall-1232 branch from 3d832f8 to a4d9fe9 Compare February 5, 2026 14:08

Move fix to RecallPrecision metric and improve test

7e42979

[pre-commit.ci] auto fixes from pre-commit.com hooks

53d840d

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix low recall when limit_val_batches is set#1298

Fix low recall when limit_val_batches is set#1298
vickysharma-prog wants to merge 6 commits intoweecology:mainfrom
vickysharma-prog:fix-limit-batches-recall-1232

vickysharma-prog commented Feb 3, 2026

Uh oh!

jveitchmichaelis left a comment •

edited

Loading

Uh oh!

vickysharma-prog commented Feb 5, 2026

Uh oh!

vickysharma-prog commented Feb 5, 2026

Uh oh!

vickysharma-prog commented Feb 5, 2026

Uh oh!

jveitchmichaelis commented Feb 5, 2026 •

edited

Loading

Uh oh!

vickysharma-prog commented Feb 5, 2026

Uh oh!

vickysharma-prog commented Feb 5, 2026

Uh oh!

jveitchmichaelis commented Feb 5, 2026

Uh oh!

vickysharma-prog commented Feb 6, 2026

Uh oh!

vickysharma-prog commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vickysharma-prog commented Feb 3, 2026

Description

Related Issue(s)

AI-Assisted Development

Uh oh!

jveitchmichaelis left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vickysharma-prog commented Feb 5, 2026

Uh oh!

vickysharma-prog commented Feb 5, 2026

Uh oh!

vickysharma-prog commented Feb 5, 2026

Uh oh!

jveitchmichaelis commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vickysharma-prog commented Feb 5, 2026

Uh oh!

vickysharma-prog commented Feb 5, 2026

Uh oh!

jveitchmichaelis commented Feb 5, 2026

Uh oh!

vickysharma-prog commented Feb 6, 2026

Uh oh!

vickysharma-prog commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jveitchmichaelis left a comment •

edited

Loading

jveitchmichaelis commented Feb 5, 2026 •

edited

Loading