Skip to content

Conversation

@norabelrose
Copy link
Collaborator

@norabelrose norabelrose commented Aug 8, 2025

This PR does two main things:

  1. We don't use Dataloader2 because that's not supported anymore
  2. We add support for the Muon optimizer, because we've found that it dramatically speeds up convergence and leads to much "better" (in the sense of KL loss) lenses

There are also a handful of smaller changes, like using a sample from the SmolLM2 corpus as the default dataset, rather than the Pile, since the Pile is not actually on the HF Hub anymore.

@levmckinney levmckinney self-requested a review August 9, 2025 03:01
@norabelrose norabelrose changed the title Stop using Dataloader2 Support Muon, stop using Dataloader2 Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant