George Hotz already implemented LLaMA 7B and 15B on Twitch yesterday on GPU in Tunygrad llama branch:

https://github.com/geohot/tinygrad/tree/llama

The only problem is that it's swapping on 16GB Macbook, so you need at least 24GB in practice.

There is also a gpu-acelerated fork of the original repo

https://github.com/remixer-dec/llama-mps