Could you explain how it works?
1. Running the model: it's built on the open-source (and amazing) llama.cpp project for running quantized (i.e. compressed) models like Llama 2 (launched yesterday) that will fit in memory on even a commodity Mac. It's similar to their "server" example as a starting point.
2. Downloading and storing models: models are distributed in a way that ensures their integrity and re-usability as much as possible (since they are large files!). For this we use a similar approach as Docker (https://github.com/distribution/distribution)
3. Creating custom models: models can be extended with this new idea we're experimenting with: a Modelfile. What this will do is effectively add "layers" to a model so you can distribute model data together and keep them self-contained. This builds on what I mentioned in 2 – our hope is this will make it easier to extend models like Llama 2 to your own use cases (e.g. a character).