Skip to content

Add support for tied word embeddings #1252

Closed
@gabe-l-hart

Description

@gabe-l-hart

🚀 The feature, motivation and pitch

The transformers implementation of llama has the option to support tying the input word embeddings to the output layer to share the weights. The request here is to add support for that feature in torchchat.

Alternatives

Models that require tied embeddings could be converted to duplicate the embedding tensor in the conversion process.

Additional context

This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport.

RFC (Optional)

I have a working implementation of this that I plan to put up as a pull request. The changes are roughly:

  • Add tie_word_embeddings to TransformerArgs
  • Copy tok_embeddings.weight to model.output.weight in a load_hook in the Transformer module

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions