What does HackerNews think of LoRA?

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language: Python

#23 in Deep learning
#8 in R
I just read the paper about LORA. The main idea is that you write the weights of each neural network as

W = W0 + B A

Where W0 is the trained model’s weights, which are kept fixed, and A and B are matrices but with a much much lower rank than the originals (say r = 4).

It has been shown (as mentioned in the lora paper that training for specific tasks results in low rank corrections, so this is what it is all about. I think that doing LoRa can be done locally.

[1] https://github.com/microsoft/LoRA

There's a difference between buzzwords and jargon. Buzzwords can start out as jargon, but have their technical meaning stripped by users who are just trying to sound persuasive. Examples include words like synergy, vertical, dynamic, cyber strategy, and NFT.

That's not what's happening in the parent comment. They're talking about projects like

https://github.com/ZrrSkywalker/LLaMA-Adapter

https://github.com/microsoft/LoRA

https://github.com/tloen/alpaca-lora

and specifically the paper: https://arxiv.org/pdf/2106.09685.pdf

Lora is just a way to re-train a network for less effort. Before we had to fiddle with all the weights, but with Lora we're only touching 1 in every 10,000 weights.

The parent comment says GPT4all doesn't give us a way to train the full size Llama model using the new lora technique. We'll have to build that ourselves. But it does give us a very huge and very clean dataset to work with, which will aid us in the quest to create an open source chatGPT killer.

LLaMA is the large language model published by Facebook (https://ai.facebook.com/blog/large-language-model-llama-meta...). In theory the model is private, but the model weights were shared with researchers and quickly leaked to the wider Internet. This is one of the first large language models available to ordinary people, much like Stable Diffusion is an image generation model available to ordinary people in contrast to DALL-E or MidJourney.

With the model's weights open to people, people can do interesting generative stuff. However, it's still hard to train the model to do new things: training large language models is famously expensive because of both their raw size and their structure. Enter...

LoRA is a "low rank adaptation" technique for training large language models, fairly recently published by Microsoft (https://github.com/microsoft/LoRA). In brief, the technique assumes that fine-tuning a model really just involves tweaks to the model parameters that are "small" in some sense, and through math this algorithm confines the fine-tuning to just the small adjustment weights. Rather than asking an ordinary person to re-train 7 billion or 11 billion or 65 billion parameters, LoRA lets users fine-tune a model with about three orders of magnitude fewer adjustment parameters.

Combine these two – publicly-available language model weights and a way to fine tune it – and you get work like the story here, where the language model is turned into something a lot like ChatGPT that can run on a consumer-grade laptop.