Nice, I wish it was a little easier to integrate these models into Chat UIs like the one from Vercel or even a simple Gradio app.

Does anyone have any Spaces/Colab notebooks/etc to try this out on?

Thanks!

There are many UIs for running locally, but the easiest is koboldcpp:

https://github.com/LostRuins/koboldcpp

Its a llama.cpp wrapper descended from the roleplaying community, but works fine (and performantly) for questioning and such.

You will need to download the model from HF quantize it yourself: https://github.com/ggerganov/llama.cpp#prepare-data--run