Skip to content

Feature request: support different input/output formats in the same recipe #2732

Open
@andrewor14

Description

@andrewor14

Today, users have to do manual conversions between .pth and .safetensors formats before/after fine-tuning with torchtune.

Example 1: torchtitan -> torchtune -> HF transformers. torchtitan outputs .dcp, which can be converted to .pt, but downstream consumers like HF transformers may expect/prefer .safetensors.

Example 2: torchtune -> executorch. Models on HF hub are often in .safetensors format, but executorch expects .pth.

Today, users have to manually convert the formats before or after fine-tuning in torchtune to interop with other frameworks. For example 2, @jainapurva had to do the following manually:

from torchtune.training import FullModelHFCheckpointer
from torchtune.models import convert_weights

checkpointer = FullModelHFCheckpointer(
    checkpoint_dir='/home/appy/checkpoints/Llama3.1-8B_oasst1_qat/epoch_0',
    checkpoint_files=[...]
    output_dir='/home/appy/checkpoints/Llama3.1-8B_oasst1_qat/epoch_0',
    model_type='LLAMA3'
)
sd = checkpointer.load_checkpoint()
sd = convert_weights.tune_to_meta(sd['model'])
torch.save(sd, "/home/appy/checkpoints/Llama3.1-8B_oasst1_qat/epoch_0/checkpoint.pth") 

Proposal: torchtune recipes should support different input and output formats in the same recipe. This would improve end-to-end UX for users who wish to interop with other frameworks when using torchtune, such that they will no longer have to perform an extra conversion step with the above code (the recipe will do this for the users). Providing better support for these end-to-end flows is also the direction torchao is moving towards in general.

Related torchtitan issue: pytorch/torchtitan#1177
Related executorch issue: pytorch/executorch#3303

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions