Description
System Info
transformers
version: 4.52.4- Platform: Linux-6.1.123+-x86_64-with-glibc2.35
- Python version: 3.11.13
- Huggingface_hub version: 0.33.0
- Safetensors version: 0.5.3
- Accelerate version: 1.7.0
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.6.0+cu124 (False)
- Tensorflow version (GPU?): 2.18.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.10.6 (cpu)
- Jax version: 0.5.2
- JaxLib version: 0.5.1
- Using distributed or parallel set-up in script?: no
Who can help?
@Cyrilvallez (this is a pretty rough guess)
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
https://colab.research.google.com/drive/1FwWTcK9N9nDiLeDyuW9ckeLSzZfpI7wJ?usp=sharing
Expected behavior
I expected that my model would be loaded and that the transformers.wte.weight
would have weights loaded from the safetensors checkpoint. Instead, it does not have weights and causes accelerate
to crash.
At the very least, I expected a warning about the missing keys. There is actually some code to warn about this, but in this case it did not fire because the missing key is a tied parameter.
I can't quite clearly describe or fix the bug, but it has to do with safetensors deserialization of tied parameters. The tiee parameter's weights are not actually set from the saved checkpoint and are just the blank meta
versions.
Disabling safetensors avoids the problem.
Setting tie_word_embeddings=False
oddly does emit a warning:
Some weights of GPTBigCodeForCausalLM were not initialized from the model checkpoint at ejschwartz/resym-fielddecoder and are newly initialized: ['transformer.wte.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Here is the the original issue I opened in accelerate, but I think that the model accelerate is operating on is invalid.