Description
Today, users have to do manual conversions between .pth
and .safetensors
formats before/after fine-tuning with torchtune.
Example 1: torchtitan -> torchtune -> HF transformers. torchtitan outputs .dcp
, which can be converted to .pt
, but downstream consumers like HF transformers may expect/prefer .safetensors
.
Example 2: torchtune -> executorch. Models on HF hub are often in .safetensors
format, but executorch expects .pth
.
Today, users have to manually convert the formats before or after fine-tuning in torchtune to interop with other frameworks. For example 2, @jainapurva had to do the following manually:
from torchtune.training import FullModelHFCheckpointer
from torchtune.models import convert_weights
checkpointer = FullModelHFCheckpointer(
checkpoint_dir='/home/appy/checkpoints/Llama3.1-8B_oasst1_qat/epoch_0',
checkpoint_files=[...]
output_dir='/home/appy/checkpoints/Llama3.1-8B_oasst1_qat/epoch_0',
model_type='LLAMA3'
)
sd = checkpointer.load_checkpoint()
sd = convert_weights.tune_to_meta(sd['model'])
torch.save(sd, "/home/appy/checkpoints/Llama3.1-8B_oasst1_qat/epoch_0/checkpoint.pth")
Proposal: torchtune recipes should support different input and output formats in the same recipe. This would improve end-to-end UX for users who wish to interop with other frameworks when using torchtune, such that they will no longer have to perform an extra conversion step with the above code (the recipe will do this for the users). Providing better support for these end-to-end flows is also the direction torchao is moving towards in general.
Related torchtitan issue: pytorch/torchtitan#1177
Related executorch issue: pytorch/executorch#3303