Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support only-within-sequence attention for MosaicGPT #266

Merged
merged 10 commits into from
Mar 30, 2023

Conversation

alextrott16
Copy link
Contributor

@alextrott16 alextrott16 commented Mar 30, 2023

Makes it possible to have attention restricted to tokens within the same source sequence when using pre-concatenated text dataloading.

  • Adds eos_token_id and bos_token_id flags for the text dataloader, which instructs it to add "sequence_id" to the batch using one of those tokens as the separator.
  • Adds attn_uses_sequence_id flag to MosaicGPT config/model and sequence_id to the forward arguments.
  • Modifies the attention bias construction to restrict attention based on sequence_id if the model config is set to use that input.
  • Mops up some old typos.

125m comparison: https://wandb.ai/mosaic-ml/v004-125m
3b comparison: https://wandb.ai/mosaic-ml/v004-3b (for profiling; not a full convergence run)

@vchiley
Copy link
Contributor

vchiley commented Mar 30, 2023

cc @jfrankle

@alextrott16
Copy link
Contributor Author

FYI, I think this PR has picked up some commits from the main branch that maybe weren't part of release/v0.0.4 yet? There are some changed files that I didn't touch in any of my commits.

Copy link
Contributor

@vchiley vchiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few things to address + a few nit

examples/common/text_data.py Outdated Show resolved Hide resolved
examples/common/text_data.py Outdated Show resolved Hide resolved
examples/common/text_data.py Outdated Show resolved Hide resolved
examples/common/text_data.py Outdated Show resolved Hide resolved
examples/common/text_data.py Outdated Show resolved Hide resolved
examples/llm/src/models/mosaic_gpt/mosaic_gpt.py Outdated Show resolved Hide resolved
examples/llm/tests/test_model.py Outdated Show resolved Hide resolved
Copy link
Contributor

@vchiley vchiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@alextrott16 alextrott16 merged commit a80f35d into release/v0.0.4 Mar 30, 2023
@alextrott16 alextrott16 deleted the alex/sequence-id branch March 30, 2023 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants