[bug] DoRA is broken

Two separate DoRA bugs I just noticed:

(1) Llama 3.2 1B config with DoRA errors on state dict load. Repro:

```
tune run lora_finetune_single_device --config llama3_2/1B_lora_single_device \
gradient_accumulation_steps=1 max_steps_per_epoch=5 model.use_dora=True
...
Exception: Error converting the state dict. Found unexpected key: "layers.0.attn.q_proj.magnitude". Please make sure you're loading a checkpoint with the right format.
```

(2) Llama 3.2 Vision 11B model with DoRA has NaN loss. Repro:

```
tune run lora_finetune_single_device --config llama3_2_vision/11B_lora_single_device \
max_steps_per_epoch=5 gradient_accumulation_steps=1 model.use_dora=True
```

Once we fix them we should add recipe test cases setting `model.use_dora=True` to catch these errors in the future, cc @felipemello1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] DoRA is broken #1903

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] DoRA is broken #1903

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions