Support only-within-sequence attention for MosaicGPT #266

alextrott16 · 2023-03-30T01:03:12Z

Makes it possible to have attention restricted to tokens within the same source sequence when using pre-concatenated text dataloading.

Adds eos_token_id and bos_token_id flags for the text dataloader, which instructs it to add "sequence_id" to the batch using one of those tokens as the separator.
Adds attn_uses_sequence_id flag to MosaicGPT config/model and sequence_id to the forward arguments.
Modifies the attention bias construction to restrict attention based on sequence_id if the model config is set to use that input.
Mops up some old typos.

125m comparison: https://wandb.ai/mosaic-ml/v004-125m
3b comparison: https://wandb.ai/mosaic-ml/v004-3b (for profiling; not a full convergence run)

* updt flg * updt tests and lint

vchiley · 2023-03-30T01:04:12Z

cc @jfrankle

alextrott16 · 2023-03-30T01:08:57Z

FYI, I think this PR has picked up some commits from the main branch that maybe weren't part of release/v0.0.4 yet? There are some changed files that I didn't touch in any of my commits.

vchiley

There are a few things to address + a few nit

examples/common/text_data.py

examples/llm/src/models/mosaic_gpt/mosaic_gpt.py

examples/llm/tests/test_model.py

vchiley

lgtm

dskhudia and others added 7 commits March 28, 2023 19:48

Use HF generate for inference (#261)

acba444

updt flg (#262)

ac56e8c

* updt flg * updt tests and lint

Add sequence_id support

8b80329

No error on missing sequence_id

c788e7d

Change input checking logic

0ed8c67

Minor fix

de951d3

Minor fix

452ea1d

alextrott16 requested review from vchiley and abhi-mosaic March 30, 2023 01:03

Fix missing arg in test

3878ae8

vchiley requested changes Mar 30, 2023

View reviewed changes

alextrott16 added 2 commits March 29, 2023 20:48

Incorporate PR suggestions; add tests

8420dde

hello pyright my old friend

0625f92

vchiley approved these changes Mar 30, 2023

View reviewed changes

alextrott16 merged commit a80f35d into release/v0.0.4 Mar 30, 2023

alextrott16 deleted the alex/sequence-id branch March 30, 2023 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support only-within-sequence attention for MosaicGPT #266

Support only-within-sequence attention for MosaicGPT #266

alextrott16 commented Mar 30, 2023 •

edited

Loading

vchiley commented Mar 30, 2023

alextrott16 commented Mar 30, 2023

vchiley left a comment

vchiley left a comment

Support only-within-sequence attention for MosaicGPT #266

Support only-within-sequence attention for MosaicGPT #266

Conversation

alextrott16 commented Mar 30, 2023 • edited Loading

vchiley commented Mar 30, 2023

alextrott16 commented Mar 30, 2023

vchiley left a comment

Choose a reason for hiding this comment

vchiley left a comment

Choose a reason for hiding this comment

alextrott16 commented Mar 30, 2023 •

edited

Loading