Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mllama: update docs #34334

Merged
merged 4 commits into from
Oct 30, 2024
Merged

Mllama: update docs #34334

merged 4 commits into from
Oct 30, 2024

Conversation

zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Oct 23, 2024

What does this PR do?

Fixes #34304 and adds info about lm-head resizing. Maybe also fixes #33819?

Comment on lines 41 to 55
```python
pre_expansion_embeddings = model.language_model.lm_head.weight.data
mu = torch.mean(pre_expansion_embeddings, dim=0).float()
n = pre_expansion_embeddings.size()[0]
sigma = ((pre_expansion_embeddings - mu).T @ (pre_expansion_embeddings - mu)) / n
dist = torch.distributions.multivariate_normal.MultivariateNormal(mu, covariance_matrix=1e-5 * sigma)


num_new_tokens = 1 # 1 for the `"<|image|>"` token
lm_head_weights = model.language_model.lm_head.weight

new_token_embedding = torch.stack(tuple(dist.sample() for _ in range(num_new_tokens)), dim=0).to(device=lm_head_weights.device, dtype=lm_head_weights.dtype)
lm_head_weights.data = torch.cat([lm_head_weights.data, new_token_embedding], dim=0)
lm_head_weights.num_embeddings = lm_head_weights.data.shape[0]
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be done internally if you use the correct flag for resize token embedding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I see we don;t let users to specify which embeddings to resize and use input_embeddings by default. In case weights are tied (not case of mllama) we also resize output embeddings

Or you mean there is another method similar to resize_token_embeddings? Might have overlooked that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the output of get_input_embeddings which should always be the input embedding and by default the output embedding from get_output_emebdding is resized when you are tied. But you are right, you can't only resize the lm head.

Tho ther might be some util function you can re-use no? 🤗 Feel free to merge!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, there was a way to hide all the ugly code in private methods

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp merged commit 0f764a5 into huggingface:main Oct 30, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants