-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Twin RHO Model Step 1: create the Twin RHO Model #547
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #547 +/- ##
==========================================
+ Coverage 82.84% 82.92% +0.08%
==========================================
Files 220 221 +1
Lines 10235 10298 +63
==========================================
+ Hits 8479 8540 +61
- Misses 1756 1758 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes mostly look good, but I am a bit confused about who calls set_extra_state/updates the current model and it whether we can make this a bit less convoluted. It's not clear to me yet why this dict is necessary. Also I wonder whether we can avoid calling both models :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The updated logic looks good and avoids the double model call when possible. My only comment left is regarding the nolintS
This PR is the first PR to implement another way of producing holdout set, il model and irreducible loss (typically suitable for small datasets):
Our current architecture only allow one trigger id to correspond to one model id. To accommodate two il models within one trigger, I create a "twin model" which internally consists of two il models. During training, each il model will memorize the sample ids it has seen, so that during evaluation each il model will be used for the samples the model hasn't seen.
How it works
RHOLossDownsamplingStrategy
randomly samples half of the training set and mark theused
column inselector_state_metadata
table of those samples asTrue
. The strategy issues a request to train aRHOLOSSTwinModel
on this TSS. (unimplemented)RHOLOSSTwinModel
is instantiated. Only the 0th model is trained on this dataset (implemented in this PR).RHOLossDownsamplingStrategy
produces the other half of the training set by selecting the samples withused==False
. The strategy issues a request to finetune this twin model. (unimplemented)RHOLOSSTwinModel
is instantiated again. Only the 1th model is trained on this dataset (implemented in this PR).used
flags.Apparently it is not the optimal way to train a twin RHO model, but it's a very straightforward way and we can optimize it depending on how well it performs.
Current drawbacks
Due to
used
RHOLoss
will currently be not compatible with some presampling strategies that also useused
fields such asFreshnessSamplingStrategy
.Next PR
Implementing step 1 and 3: preparing the split holdout set.
How to review
All the main logic is in modyn/models/rho_loss_twin_model/rho_loss_twin_model.py