Skip to content

[bug] DoRA is broken #1903

Closed
Closed
@ebsmothers

Description

@ebsmothers

Two separate DoRA bugs I just noticed:

(1) Llama 3.2 1B config with DoRA errors on state dict load. Repro:

tune run lora_finetune_single_device --config llama3_2/1B_lora_single_device \
gradient_accumulation_steps=1 max_steps_per_epoch=5 model.use_dora=True
...
Exception: Error converting the state dict. Found unexpected key: "layers.0.attn.q_proj.magnitude". Please make sure you're loading a checkpoint with the right format.

(2) Llama 3.2 Vision 11B model with DoRA has NaN loss. Repro:

tune run lora_finetune_single_device --config llama3_2_vision/11B_lora_single_device \
max_steps_per_epoch=5 gradient_accumulation_steps=1 model.use_dora=True

Once we fix them we should add recipe test cases setting model.use_dora=True to catch these errors in the future, cc @felipemello1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions