Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twin RHO Model Step 1: create the Twin RHO Model #547

Merged
merged 18 commits into from
Jul 1, 2024

Conversation

XianzheMa
Copy link
Collaborator

@XianzheMa XianzheMa commented Jun 24, 2024

This PR is the first PR to implement another way of producing holdout set, il model and irreducible loss (typically suitable for small datasets):

Split training set into two halves; train two IL models, each on one half. Each model provides the irreducible loss for samples that it was not trained on. The main model is still trained on the original training set (CIFAR10, CIFAR100, CINIC-10).

Our current architecture only allow one trigger id to correspond to one model id. To accommodate two il models within one trigger, I create a "twin model" which internally consists of two il models. During training, each il model will memorize the sample ids it has seen, so that during evaluation each il model will be used for the samples the model hasn't seen.

How it works

  1. At selector, RHOLossDownsamplingStrategy randomly samples half of the training set and mark the used column in selector_state_metadata table of those samples as True. The strategy issues a request to train a RHOLOSSTwinModel on this TSS. (unimplemented)
  2. At trainer server, RHOLOSSTwinModel is instantiated. Only the 0th model is trained on this dataset (implemented in this PR).
  3. At selector, RHOLossDownsamplingStrategy produces the other half of the training set by selecting the samples with used==False. The strategy issues a request to finetune this twin model. (unimplemented)
  4. At trainer server, RHOLOSSTwinModel is instantiated again. Only the 1th model is trained on this dataset (implemented in this PR).
  5. At selector, (optionally) clear the used flags.
  6. At trainer server when training the main model: nothing needed to change as the logic is handled internally in the twin model.

Apparently it is not the optimal way to train a twin RHO model, but it's a very straightforward way and we can optimize it depending on how well it performs.

Current drawbacks

Due to used RHOLoss will currently be not compatible with some presampling strategies that also use used fields such as FreshnessSamplingStrategy.

Next PR

Implementing step 1 and 3: preparing the split holdout set.

How to review

All the main logic is in modyn/models/rho_loss_twin_model/rho_loss_twin_model.py

Copy link

Line Coverage: -% ( % to main)
Branch Coverage: -% ( % to main)

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

Attention: Patch coverage is 97.36842% with 2 lines in your changes missing coverage. Please review.

Project coverage is 82.92%. Comparing base (59ea026) to head (2992df2).

Current head 2992df2 differs from pull request most recent head cddc9ea

Please upload reports for the commit cddc9ea to get more accurate results.

Files Patch % Lines
...ig/schema/pipeline/sampling/downsampling_config.py 83.33% 1 Missing ⚠️
...pling_strategies/rho_loss_downsampling_strategy.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #547      +/-   ##
==========================================
+ Coverage   82.84%   82.92%   +0.08%     
==========================================
  Files         220      221       +1     
  Lines       10235    10298      +63     
==========================================
+ Hits         8479     8540      +61     
- Misses       1756     1758       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@XianzheMa XianzheMa changed the title Twin RHO Model Twin RHO Model Step 1: create the Twin RHO Model Jun 25, 2024
@XianzheMa XianzheMa requested a review from MaxiBoether June 25, 2024 14:41
@XianzheMa XianzheMa marked this pull request as ready for review June 25, 2024 14:42
Copy link
Contributor

@MaxiBoether MaxiBoether left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes mostly look good, but I am a bit confused about who calls set_extra_state/updates the current model and it whether we can make this a bit less convoluted. It's not clear to me yet why this dict is necessary. Also I wonder whether we can avoid calling both models :)

modyn/models/dlrm/dlrm.py Outdated Show resolved Hide resolved
modyn/models/rho_loss_twin_model/rho_loss_twin_model.py Outdated Show resolved Hide resolved
modyn/models/rho_loss_twin_model/rho_loss_twin_model.py Outdated Show resolved Hide resolved
modyn/models/rho_loss_twin_model/rho_loss_twin_model.py Outdated Show resolved Hide resolved
@XianzheMa XianzheMa requested a review from MaxiBoether June 26, 2024 22:13
Copy link
Contributor

@MaxiBoether MaxiBoether left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The updated logic looks good and avoids the double model call when possible. My only comment left is regarding the nolintS

@XianzheMa XianzheMa merged commit 60d3baa into main Jul 1, 2024
11 checks passed
@XianzheMa XianzheMa deleted the XianzheMa/feature/twin-rho-model branch July 3, 2024 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants