Closed
Description
Two separate DoRA bugs I just noticed:
(1) Llama 3.2 1B config with DoRA errors on state dict load. Repro:
tune run lora_finetune_single_device --config llama3_2/1B_lora_single_device \
gradient_accumulation_steps=1 max_steps_per_epoch=5 model.use_dora=True
...
Exception: Error converting the state dict. Found unexpected key: "layers.0.attn.q_proj.magnitude". Please make sure you're loading a checkpoint with the right format.
(2) Llama 3.2 Vision 11B model with DoRA has NaN loss. Repro:
tune run lora_finetune_single_device --config llama3_2_vision/11B_lora_single_device \
max_steps_per_epoch=5 gradient_accumulation_steps=1 model.use_dora=True
Once we fix them we should add recipe test cases setting model.use_dora=True
to catch these errors in the future, cc @felipemello1.