[FEAT] Integrate LoRA-One into PEFT

### Feature request

Paper: https://arxiv.org/abs/2502.01235 (ICML 2025 Oral Presentation)
Reference code: https://github.com/YuanheZ/LoRA-One

---

### Content Overview

This paper explores how theory can guide and enhance practical algorithms, using Low-Rank Adaptation (LoRA) in large language models as a case study. We rigorously prove that, under gradient descent, LoRA adapters align with specific singular subspaces of the one-step full fine-tuning gradient. This result suggests that, by properly initializing the adapters using the one-step full gradient, subspace alignment can be achieved immediately—applicable to both linear and nonlinear models. Building on our theory, we propose a theory-driven algorithm, LoRA-One, where the linear convergence (as well as generalization) is built and incorporating preconditioners theoretically helps mitigate the effects of ill-conditioning. Besides, our theory reveals connections between LoRA-One and other gradient-alignment-based methods, helping to clarify misconceptions in the design of such algorithms. LoRA-One achieves significant empirical improvements over LoRA and its variants across benchmarks in natural language understanding, mathematical reasoning, and code generation.

<img width="778" height="603" alt="Image" src="https://github.com/user-attachments/assets/cf355854-a888-4fdd-8e92-274d5c084ded" />

---
### Main Contributions
We theoretically prove:
- standard LoRA will align to the top-r subspace of first-step full gradient;
- LoRA can achieve fast linear convergence both in optimization and generalization if we initialize LoRA using best r-rank first-step full gradient.

Grounded by our theory, we establish the optimal initialization making use of gradient, clarifying the suboptimality of previous graident-based methods such as LoRA-GA, LoRA-SB. Our method is supported by performance improvement in a wide range of instruction, math, code benchmarks.

<img width="1246" height="789" alt="Image" src="https://github.com/user-attachments/assets/a04b4886-0d4c-4a7d-847a-ea95366de167" />

---
### Algorithmic Overview

For each weight matrix, we first compute the gradient $\nabla_{W} L$ under full fine-tuning using a batch and perform SVD on $-\nabla_{W} L$ to get $U$, $\Sigma$, $V$, then we initialize LoRA via
```math
\mathbf{A}_{0}=\frac{1}{\sqrt{\gamma}} U_{[:,:r]} Diag(S[:r])\,,\quad \mathbf{B}_{0}=\frac{1}{\sqrt{\gamma}} Diag(S[:r]) V_{[:,:r]}^\top\,,\quad W_{adapted} = W_{pre}+\frac{\alpha}{\sqrt{r}}\mathbf{A}_{0} \mathbf{B}_{0}\,,
```
which is equivalent to perform one best r-rank full gradient descent under full fine-tuning with learning rate $\frac{\alpha}{\gamma\sqrt{r}}$ at the initialization. The SVD is implemented by random SVD, which is super efficient.

---

### Experiments

<img width="1234" height="576" alt="Image" src="https://github.com/user-attachments/assets/cfca72af-93c4-4c32-af9b-75cd7abc2410" />

<img width="1200" height="619" alt="Image" src="https://github.com/user-attachments/assets/46333a29-99a8-441d-9717-e73e7b2fcb03" />

<img width="609" height="292" alt="Image" src="https://github.com/user-attachments/assets/6b6fe1cd-0a67-4292-a3fa-d1953833cad3" />

<img width="1184" height="716" alt="Image" src="https://github.com/user-attachments/assets/edd17e6b-e4de-4c89-ad4b-a903724c8ba8" />

### Your contribution

The code implementation is similar to PiSSA and LoRA-GA. The core idea is to replace the random init LoRA adapters with matrices from SVD. One additional need is the first-step full gradient compution, which has been implemented by a custom PEFT version in [LoRA-GA](https://github.com/Outsider565/LoRA-GA). Welcome any suggestions or guidance on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEAT] Integrate LoRA-One into PEFT #2882

Feature request

Content Overview

Main Contributions

Algorithmic Overview

Experiments

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEAT] Integrate LoRA-One into PEFT #2882

Description

Feature request

Content Overview

Main Contributions

Algorithmic Overview

Experiments

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions