Low-Rank Adaptation (LoRA)

LoRA is an efficient method of index a foundation model for specific tasks.

When adapting to a specific task, pre-trained language models have low “intrinsic dimension” and can still learn efficiently despite a random projection to a smaller subspace. LoRA exploits this by freezing the pre-trained weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, as shown in Figure 1 from the paper:

For a pre-trained weight matrix $W_{0} \in R^{d \times k}$ , we constrain its update by representing the latter with a low-rank decomposition $W_{0} + Δ W = W_{0} + B A$ , where $B \in R^{d \times r}$ , $A \in R^{r \times k}$ , and the rank $r ≪ min (d, k)$ . During training, $W_{0}$ is frozen and does not receive gradient updates, while $A$ and $B$ contain trainable parameters. Note both $W_{0}$ and $Δ W = B A$ are multiplied with the same input, and their respective output vectors are summed coordinate-wise. For $h = W_{0} x$ , our modified forward pass yields:

h = W_{0} x + Δ W x = W_{0} x + B A x

As shown in Figure 1, A has Gaussian initialisation and B is zero initialised.

References

LoRA: Low-Rank Adaptation of Large Language Models

Intrinsic Dimensionality Explains The Effectiveness of Language Model Fine-Tuning

ML Notes

Explorer

Low-Rank Adaptation (LoRA)

Graph View

Backlinks