Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle missing scale_inv_name #2

Merged
merged 3 commits into from
Dec 27, 2024

Conversation

YangWang92
Copy link
Contributor

Fixed an issue where weight and weight_scale_inv (e.g. model.layers.39.mlp.experts.92.gate_proj.weight and model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.

Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.
Added torch.cuda.empty_cache() to free up unused memory on the GPU,
@OpenSourceRonin
Copy link

I've uploaded the converted bf16 model here for everyone to use freely: https://huggingface.co/collections/opensourcerelease/deepseek-v3-bf16-676d7fa1b3f500d39f8f559b

@stack-heap-overflow stack-heap-overflow merged commit 8f1c948 into deepseek-ai:main Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants