Skip to content

Conversation

@lapp0
Copy link
Collaborator

@lapp0 lapp0 commented Jan 2, 2025

See #5 (comment)

  • don't skip first sample
  • average val / test loss by token count (don't change how gradients are averaged though)
  • don't add 1 to seq len (artifact from causal lm loss)

@lhallee lhallee merged commit d94f7c6 into master Jan 3, 2025
@lhallee lhallee deleted the padded-dataloader branch June 20, 2025 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants