Skip to content

Support Huggingface models from safetensors #1249

Closed
@gabe-l-hart

Description

@gabe-l-hart

🚀 The feature, motivation and pitch

There are many models on Huggingface that are published as safetensors rather than model.pth checkpoints. The request here is to support converting and loading those checkpoints into a format that is usable with torchchat.

There are several places where this limitation is currently enforced:

  • _download_hf_snapshot method explicitly ignores safetensors files.
  • convert_hf_checkpoint explicitly looks for pytorch_model.bin.index.json which would be named differently for models that use safetensors (e.g. model.safetensors.index.json)
  • convert_hf_checkpoint only supports torch.load to load the state_dict rather than safetensors.torch.load

Alternatives

Currently, this safetensors -> model.pth can be accomplished manually after downloading a model locally, so this could be solved with documentation instead of code.

Additional context

This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport

RFC (Optional)

I have a working implementation to support safetensors during download and conversion that I plan to submit as a PR. The changes address the three points in code referenced above:

  1. Allow the download of safetensors files in _download_hf_snapshot
    • I'm not yet sure how to avoid double-downloading weights for models that have both safetensors and model.pth, so will look to solve this before concluding the work
  2. When looking for the tensor index file, search for all files ending in .index.json, and if a single file is found, use that one
  3. When loading the state_dict, use the correct method based on the type of file (torch.load or safetensors.torch.load)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions