Can someone provide a guide on how run LLaMA on a fairly average CPU/Nvidia GPU?
Another great option is https://github.com/oobabooga/text-generation-webui
The 7B model will run without changes on a 3080. The 13B 4-bit model also runs on a 3080.
This Reddit post has the instructions I followed: https://old.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...