George Hotz already implemented LLaMA 7B and 15B on Twitch yesterday on GPU in Tunygrad llama branch:
https://github.com/geohot/tinygrad/tree/llama
The only problem is that it's swapping on 16GB Macbook, so you need at least 24GB in practice.
There is also a gpu-acelerated fork of the original repo