LLaMA is the large language model published by Facebook (https://ai.facebook.com/blog/large-language-model-llama-meta...). In theory the model is private, but the model weights were shared with researchers and quickly leaked to the wider Internet. This is one of the first large language models available to ordinary people, much like Stable Diffusion is an image generation model available to ordinary people in contrast to DALL-E or MidJourney.

With the model's weights open to people, people can do interesting generative stuff. However, it's still hard to train the model to do new things: training large language models is famously expensive because of both their raw size and their structure. Enter...

LoRA is a "low rank adaptation" technique for training large language models, fairly recently published by Microsoft (https://github.com/microsoft/LoRA). In brief, the technique assumes that fine-tuning a model really just involves tweaks to the model parameters that are "small" in some sense, and through math this algorithm confines the fine-tuning to just the small adjustment weights. Rather than asking an ordinary person to re-train 7 billion or 11 billion or 65 billion parameters, LoRA lets users fine-tune a model with about three orders of magnitude fewer adjustment parameters.

Combine these two – publicly-available language model weights and a way to fine tune it – and you get work like the story here, where the language model is turned into something a lot like ChatGPT that can run on a consumer-grade laptop.