Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when converting "state-spaces/mamba2-130m" weights to huggingface-compatible format #32496

Closed
2 of 4 tasks
learning-chip opened this issue Aug 7, 2024 · 1 comment · Fixed by #32580
Closed
2 of 4 tasks
Labels
bug Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!

Comments

@learning-chip
Copy link

learning-chip commented Aug 7, 2024

System Info

  • Transformers version: 4.40.0

Who can help?

@molbap @ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I tried to load https://huggingface.co/state-spaces/mamba2-130m into HF-compatible Mamba-2 (#32080), using the convert_mamba2_ssm_checkpoint_to_pytorch.py script. But the script assumes model weights to be in safetensors format:

with safe_open(mamba2_checkpoint_path, framework="pt") as f:
for k in f.keys():
newk = k.removeprefix("model.")
original_state_dict[newk] = f.get_tensor(k).clone()

but the weight file is is in torch bin format and cannot be opened in this way.

Also, the script requires a tokenizer path:

parser.add_argument(
"-c",
"--tokenizer_model_path",
type=str,
required=True,
help="Path to a `config.json` file corresponding to a Mamba2Config of the original mamba2_ssm model.",
)

but state-spaces/mamba2-130m reuses EleutherAI/gpt-neox-20b tokenizer instead of having its own.

Expected behavior

convert_mamba2_ssm_checkpoint_to_pytorch.py should be able to convert those Mamba-2 weights:

@molbap
Copy link
Contributor

molbap commented Aug 7, 2024

Thanks for the issue! Yes, the current conversion script is made for the mistral/codestral mamba 2 release, which uses safetensors + their own tokenizer - if you want to work on a modification to the conversion script on a new PR, feel free to do so and we can help! Else, I'll take a look at that later :)

@ArthurZucker ArthurZucker added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants