What do you use to host these models (like Vicuna, Dolly etc) on your own server and expose them using HTTP REST API? Is there an Heroku-like for LLM models?
I am looking for an open source models to do text summarization. Open AI is too expensive for my use case because I need to pass lots of tokens.
I haven't tried that but https://github.com/abetlen/llama-cpp-python and https://github.com/r2d4/openlm exists