opt-175B weights are already openly available as I understand. Hugging-face also has openly available weights for a 176B parameter LLM called Bloom. Is LLAMA offering something over and above these?

Yeah, their recent papers show the smaller LLAMA models outperforming the major LLMs today, and they also have bigger models. This isn't just an alternative, it's a multi order of magnitude optimization.

https://aibusiness.com/meta/meta-s-llama-language-model-outp...

Can I spend $5K and run it at home? What GPU(s) do I need?

the 7B model runs on a CUDA-compatible card with 16GB of VRAM (assuming your card has 16-bit float support).

I only got the 30b model running on a 4 x Nvidia A40 setup though.

The 30B is 64.8GB and the A40s have 48GB NVRAM ea - so does this mean you got it working on one GPU with an NVLink to a 2nd, or is it really running on all 4 A40s?

Is there a sub/forum/discord where folks talk about the nitty-gritty?

> so does this mean you got it working on one GPU with an NVLink to a 2nd, or is it really running on all 4 A40s?

it's sharded across all 4 GPUs (as per the readme here: https://github.com/facebookresearch/llama). I'd wait a few weeks to a month for people to settle on a solution for running the model, people are just going to be throwing pytorch code at the wall and seeing what sticks right now.