What does HackerNews think of FastChat?

An open platform for training, serving, and evaluating large languages. Release repo for Vicuna and FastChat-T5.

Language: Python

Any reason you're doing that vs. using Lambda Labs / Replicate / together.ai / Banana.dev, etc.

There's a lot of good model deployment platforms that would make it easy to call your model behind a hosted endpoint

-- If you do want to self-host - there's some great libraries like https://github.com/lm-sys/FastChat and https://github.com/ggerganov/llama.cpp that might be helpful

If none of these really solve your issue - feel free to email me and I'm happy to help you figure something out - [email protected]

There is probably use for go-skynet/LocalAI[0] or lm-sys/FastChat[1] which can emulate an OpenAI API using local models.

0: https://github.com/go-skynet/LocalAI 1: https://github.com/lm-sys/FastChat/

Edit: idk if any of this support function calling tho

Cool stuff! How does this compare with Fastchat, which seems like another open source project that helps run LLM models?

At a glance, it seems like it's going for lots of similar goals (run LLMs with interoperable APIs):

https://github.com/lm-sys/FastChat

These days I use FastChat: https://github.com/lm-sys/FastChat

It’s not based on llama.cpp but huggingface transformers but can also run on CPU.

It works well, can be distributed and very conveniently provide the same REST API than OpenAI GPT.

Install a local LLM (e.g. Vicuna https://github.com/lm-sys/FastChat) to have an offline alternative for stackoverflow (and GPT4).
I second this recommendation to start with llama.cpp. It can run on a regular laptop and it gives a sense of what's possible.

If you want access to a serious GPU or TPU, then the sensible solution is to rent one in the cloud. If you just want to run smaller versions of these models, you can achieve impressive results at home on consumer grade gaming hardware.

The FastChat framework supports the Vicuna LLM, along with several others: https://github.com/lm-sys/FastChat

The Oobabooga web interface aims to become the standard interface for chat models: https://github.com/oobabooga/text-generation-webui

I don't see any indication that OpenLLaMa will run on either of those without modification. But one of those, or some other framework may emerge as a de-facto standard for running these models.

I use Vicuna[0]. It's much better than GPT4All.

Vicuna is based on 13B (not 7B) and its training data includes humans chatting with GPT-4 vs GPT4All's purely synthetic dataset generated by GPT-3.5.

[0] https://github.com/lm-sys/FastChat