Nice, I wish it was a little easier to integrate these models into Chat UIs like the one from Vercel or even a simple Gradio app.
Does anyone have any Spaces/Colab notebooks/etc to try this out on?
Thanks!
There are many UIs for running locally, but the easiest is koboldcpp:
https://github.com/LostRuins/koboldcpp
Its a llama.cpp wrapper descended from the roleplaying community, but works fine (and performantly) for questioning and such.
You will need to download the model from HF quantize it yourself: https://github.com/ggerganov/llama.cpp#prepare-data--run