Skip to content

Safetensors deserializing silently mishandles tied parameters #38870

Open
@edmcman

Description

@edmcman

System Info

  • transformers version: 4.52.4
  • Platform: Linux-6.1.123+-x86_64-with-glibc2.35
  • Python version: 3.11.13
  • Huggingface_hub version: 0.33.0
  • Safetensors version: 0.5.3
  • Accelerate version: 1.7.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.6.0+cu124 (False)
  • Tensorflow version (GPU?): 2.18.0 (False)
  • Flax version (CPU?/GPU?/TPU?): 0.10.6 (cpu)
  • Jax version: 0.5.2
  • JaxLib version: 0.5.1
  • Using distributed or parallel set-up in script?: no

Who can help?

@Cyrilvallez (this is a pretty rough guess)

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

https://colab.research.google.com/drive/1FwWTcK9N9nDiLeDyuW9ckeLSzZfpI7wJ?usp=sharing

Expected behavior

I expected that my model would be loaded and that the transformers.wte.weight would have weights loaded from the safetensors checkpoint. Instead, it does not have weights and causes accelerate to crash.

At the very least, I expected a warning about the missing keys. There is actually some code to warn about this, but in this case it did not fire because the missing key is a tied parameter.

I can't quite clearly describe or fix the bug, but it has to do with safetensors deserialization of tied parameters. The tiee parameter's weights are not actually set from the saved checkpoint and are just the blank meta versions.

Disabling safetensors avoids the problem.

Setting tie_word_embeddings=False oddly does emit a warning:

Some weights of GPTBigCodeForCausalLM were not initialized from the model checkpoint at ejschwartz/resym-fielddecoder and are newly initialized: ['transformer.wte.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Here is the the original issue I opened in accelerate, but I think that the model accelerate is operating on is invalid.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions