Fix recovery in datasets #527

MaxiBoether · 2024-06-19T14:09:01Z

Before, we had recovery logic based on reply indices. However, while working on storage, I realized those responses come non deterministically from multiple threads. Hence, we cannot rely on the ordering. We need to keep track of the sample IDs we already yielded. I changed the logic to just keep a list which is cheap to append to, and only convert to a set / hash table as soon as we failed once and we actually need to do many in checks.

github-actions · 2024-06-19T14:40:02Z

^{( % to main)}
^{( % to main)}

codecov · 2024-06-19T14:53:21Z

Codecov Report

Attention: Patch coverage is 55.55556% with 20 lines in your changes missing coverage. Please review.

Project coverage is 82.37%. Comparing base (cb0be37) to head (6017c37).

Files	Patch %	Lines
...n/evaluator/internal/dataset/evaluation_dataset.py	50.00%	14 Missing ⚠️
.../trainer_server/internal/dataset/online_dataset.py	64.70%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #527      +/-   ##
==========================================
- Coverage   82.49%   82.37%   -0.13%     
==========================================
  Files         215      215              
  Lines       10054    10080      +26     
==========================================
+ Hits         8294     8303       +9     
- Misses       1760     1777      +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

XianzheMa

my question is on the raise e in all four changed methods. It makes sense in this case to make a unit test to ensure everything works under any case (error-free case, case with error)

modyn/evaluator/internal/dataset/evaluation_dataset.py

XianzheMa · 2024-06-20T09:31:09Z

Can you piggyback deleting this log

modyn/modyn/supervisor/internal/pipeline_executor/pipeline_executor.py

Line 480 in 55fef7c

logger.info(f"Processing {len(s.triggers)} triggers in this batch.")

in this PR? This log is basically wrong because len(s.triggers) is not the number of triggers we this time are processing.

fix dataest recovery

97ffb9c

MaxiBoether requested a review from XianzheMa June 19, 2024 19:50

Merge branch 'main' into fix/MaxiBoether/recovery

d3377cc

XianzheMa requested changes Jun 19, 2024

View reviewed changes

modyn/evaluator/internal/dataset/evaluation_dataset.py Show resolved Hide resolved

XianzheMa self-requested a review June 20, 2024 09:31

XianzheMa approved these changes Jun 20, 2024

View reviewed changes

XianzheMa and others added 3 commits June 20, 2024 11:41

Merge branch 'main' into fix/MaxiBoether/recovery

96e1c61

integrate xianzhe's suggestion

7822e8b

Merge branch 'main' into fix/MaxiBoether/recovery

6017c37

MaxiBoether merged commit c0b0ae8 into main Jun 20, 2024
20 checks passed

MaxiBoether deleted the fix/MaxiBoether/recovery branch June 20, 2024 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix recovery in datasets #527

Fix recovery in datasets #527

MaxiBoether commented Jun 19, 2024

github-actions bot commented Jun 19, 2024

codecov bot commented Jun 19, 2024 •

edited

Loading

XianzheMa left a comment

XianzheMa commented Jun 20, 2024

Fix recovery in datasets #527

Fix recovery in datasets #527

Conversation

MaxiBoether commented Jun 19, 2024

github-actions bot commented Jun 19, 2024

codecov bot commented Jun 19, 2024 • edited Loading

Codecov Report

XianzheMa left a comment

Choose a reason for hiding this comment

XianzheMa commented Jun 20, 2024

codecov bot commented Jun 19, 2024 •

edited

Loading