> Fine-tuning is good for treating it how to act, but not great for reciting/recalling data.
What underlying process makes it this way? Is it because the prompt has heavier weight?
I just read the paper about LORA. The main idea is that you write the weights of each neural network as
W = W0 + B A
Where W0 is the trained model’s weights, which are kept fixed, and A and B are matrices but with a much much lower rank than the originals (say r = 4).
It has been shown (as mentioned in the lora paper that training for specific tasks results in low rank corrections, so this is what it is all about. I think that doing LoRa can be done locally.