Tied word embeddings #1260

gabe-l-hart · 2024-10-03T15:40:32Z

Dependencies

This PR is part of a sequence in support of adding Granite Code. It depends on merging the following PRs:

Safetensors: Safetensors #1255
Bias tensors: Bias tensors #1259

Issues

Description

This PR adds support for models which have shared weights between the input word embeddings and the output layer (tied word embeddings).

Changes

Add the tie_word_embeddings parameter to TransformerArgs
Add a load_hook to the Transformer class where, if configured with tie_word_embeddings, the self.output module will share weights with the self.tok_embeddings module.

Testing

In conjunction with my other changes for Granite Code, I've been able to validate that the results produced with this logic do produce the expected token sequence.

NOTE: If there's any preferred way to include unit tests along with the PR, please let me know and I can get them added! I don't see a familiar unit test structure in the project at this point, so I've been relying on local ad-hoc testing.

pytorch-bot · 2024-10-03T15:40:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1260

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c840070 with merge base 6a2a2e8 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Jack-Khuu · 2024-10-07T22:27:15Z

Sweet and simple, @Gasoonjia for a sanity check

Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

gabe-l-hart · 2024-10-09T12:40:20Z

Thanks for working through the chain @Jack-Khuu! This is the next one up

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2024

This was referenced Oct 3, 2024

Add support for tied word embeddings #1252

Closed

Tokenizers tokenizer #1261

Merged

gabe-l-hart force-pushed the TiedWordEmbeddings-1252 branch 3 times, most recently from b1e2fff to 254fc51 Compare October 4, 2024 20:01

Jack-Khuu approved these changes Oct 7, 2024

View reviewed changes

gabe-l-hart force-pushed the TiedWordEmbeddings-1252 branch 2 times, most recently from e002700 to ced5d03 Compare October 8, 2024 21:01

feat: Add the option to tie_word_embeddings

c840070

Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

gabe-l-hart force-pushed the TiedWordEmbeddings-1252 branch from ced5d03 to c840070 Compare October 9, 2024 12:39

gabe-l-hart marked this pull request as ready for review October 9, 2024 12:40

Jack-Khuu merged commit 438ebb1 into pytorch:main Oct 9, 2024
52 checks passed

gabe-l-hart deleted the TiedWordEmbeddings-1252 branch October 10, 2024 16:07

gabe-l-hart mentioned this pull request Oct 31, 2024

Granite code support #1336

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tied word embeddings #1260

Tied word embeddings #1260

Uh oh!

gabe-l-hart commented Oct 3, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 3, 2024 •

edited

Loading

Uh oh!

Jack-Khuu commented Oct 7, 2024

Uh oh!

gabe-l-hart commented Oct 9, 2024

Uh oh!

Uh oh!

Uh oh!

Tied word embeddings #1260

Tied word embeddings #1260

Uh oh!

Conversation

gabe-l-hart commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependencies

Issues

Description

Changes

Testing

Uh oh!

pytorch-bot bot commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1260

✅ No Failures

Uh oh!

Jack-Khuu commented Oct 7, 2024

Uh oh!

gabe-l-hart commented Oct 9, 2024

Uh oh!

Uh oh!

Uh oh!

gabe-l-hart commented Oct 3, 2024 •

edited

Loading

pytorch-bot bot commented Oct 3, 2024 •

edited

Loading