handle missing scale_inv_name #2

YangWang92 · 2024-12-26T15:09:30Z

Fixed an issue where weight and weight_scale_inv (e.g. model.layers.39.mlp.experts.92.gate_proj.weight and model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.

Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.

Added torch.cuda.empty_cache() to free up unused memory on the GPU,

OpenSourceRonin · 2024-12-26T16:47:57Z

I've uploaded the converted bf16 model here for everyone to use freely: https://huggingface.co/collections/opensourcerelease/deepseek-v3-bf16-676d7fa1b3f500d39f8f559b

YangWang92 added 3 commits December 26, 2024 23:09

sort filename to reduce memory costs

e6e66fd

Add CUDA cache clearing in memory management

65d8f5f

Added torch.cuda.empty_cache() to free up unused memory on the GPU,

stack-heap-overflow merged commit 8f1c948 into deepseek-ai:main Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle missing scale_inv_name #2

handle missing scale_inv_name #2

YangWang92 commented Dec 26, 2024

OpenSourceRonin commented Dec 26, 2024

handle missing scale_inv_name #2

handle missing scale_inv_name #2

Conversation

YangWang92 commented Dec 26, 2024

OpenSourceRonin commented Dec 26, 2024