how long before someone creates a simple GUI for this?

That + a small bit of optimisation and everyone with a newer Mac / iPhone will be able to run something akin to chatGPT locally!

Isn't this a pretty crazy development - just weeks ago people said this would be impossible.

From this thread the 13b model runs just as fast as chatGPT on a M2 Macbook Air, and it's not even using the Neural Engine yet so will become significantly faster once that is utilised - wow!

People have been running LLaMA in 4bit quickly on cheap hardware with a simple GUI for over a week using https://github.com/oobabooga/text-generation-webui

Just not on Macs. (that repo does not support Apple Silicon)