Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recovery in datasets #527

Merged
merged 5 commits into from
Jun 20, 2024
Merged

Fix recovery in datasets #527

merged 5 commits into from
Jun 20, 2024

Conversation

MaxiBoether
Copy link
Contributor

Before, we had recovery logic based on reply indices. However, while working on storage, I realized those responses come non deterministically from multiple threads. Hence, we cannot rely on the ordering. We need to keep track of the sample IDs we already yielded. I changed the logic to just keep a list which is cheap to append to, and only convert to a set / hash table as soon as we failed once and we actually need to do many in checks.

Copy link

Line Coverage: -% ( % to main)
Branch Coverage: -% ( % to main)

Copy link

codecov bot commented Jun 19, 2024

Codecov Report

Attention: Patch coverage is 55.55556% with 20 lines in your changes missing coverage. Please review.

Project coverage is 82.37%. Comparing base (cb0be37) to head (6017c37).

Files Patch % Lines
...n/evaluator/internal/dataset/evaluation_dataset.py 50.00% 14 Missing ⚠️
.../trainer_server/internal/dataset/online_dataset.py 64.70% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #527      +/-   ##
==========================================
- Coverage   82.49%   82.37%   -0.13%     
==========================================
  Files         215      215              
  Lines       10054    10080      +26     
==========================================
+ Hits         8294     8303       +9     
- Misses       1760     1777      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MaxiBoether MaxiBoether requested a review from XianzheMa June 19, 2024 19:50
Copy link
Collaborator

@XianzheMa XianzheMa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my question is on the raise e in all four changed methods. It makes sense in this case to make a unit test to ensure everything works under any case (error-free case, case with error)

@XianzheMa
Copy link
Collaborator

Can you piggyback deleting this log

logger.info(f"Processing {len(s.triggers)} triggers in this batch.")
in this PR? This log is basically wrong because len(s.triggers) is not the number of triggers we this time are processing.

@XianzheMa XianzheMa self-requested a review June 20, 2024 09:31
@MaxiBoether MaxiBoether merged commit c0b0ae8 into main Jun 20, 2024
20 checks passed
@MaxiBoether MaxiBoether deleted the fix/MaxiBoether/recovery branch June 20, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants