-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support only-within-sequence attention for MosaicGPT #266
Conversation
cc @jfrankle |
FYI, I think this PR has picked up some commits from the main branch that maybe weren't part of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few things to address + a few nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Makes it possible to have attention restricted to tokens within the same source sequence when using pre-concatenated text dataloading.
eos_token_id
andbos_token_id
flags for thetext
dataloader, which instructs it to add"sequence_id"
to the batch using one of those tokens as the separator.attn_uses_sequence_id
flag to MosaicGPT config/model andsequence_id
to the forward arguments.sequence_id
if the model config is set to use that input.125m comparison: https://wandb.ai/mosaic-ml/v004-125m
3b comparison: https://wandb.ai/mosaic-ml/v004-3b (for profiling; not a full convergence run)