Can someone provide a guide on how run LLaMA on a fairly average CPU/Nvidia GPU?

Another great option is https://github.com/oobabooga/text-generation-webui

The 7B model will run without changes on a 3080. The 13B 4-bit model also runs on a 3080.

This Reddit post has the instructions I followed: https://old.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...