Are there any online communities running these models with non professional hardware? I keep running into issues with poor documentation or outdated scripts with GPT neox, BLOOM, and even stable diffusion 2. Seems like most of the support is either for professionals with clusters of A100s, or consumers who aren’t using code. I have 3 16gb Quadra GPUs but getting this stuff running on them has been surprisingly difficult

Cards I have seen LLaMA run on in 8bit and 4bit include: RTX 1660, RTX 2060, AMD 5700xt, RTX 3050, RTX 3060, AMD 6900xt, RTX 2060 12GB, RTX 3060 12GB, RTX 3080, P5000, RTX A2000, RTX 3080 20GB, RTX A4500, RTX A5000, RTX 3090, RTX 4090, RTX 6000, Tesla V100, A100 40GB, A40, RTX A6000, RTX 8000, Titan Ada

Mostly using https://github.com/oobabooga/text-generation-webui/, the AUTOMATIC1111 of textgen.