What does HackerNews think of FastChat?

Show HN: liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching | Aug 2023

Any reason you're doing that vs. using Lambda Labs / Replicate / together.ai / Banana.dev, etc.

There's a lot of good model deployment platforms that would make it easy to call your model behind a hosted endpoint

-- If you do want to self-host - there's some great libraries like https://github.com/lm-sys/FastChat and https://github.com/ggerganov/llama.cpp that might be helpful

If none of these really solve your issue - feel free to email me and I'm happy to help you figure something out - [email protected]

Show HN: Agentflow – Run Complex LLM Workflows from Simple JSON | Aug 2023

Expand Context ↕

There is probably use for go-skynet/LocalAI[0] or lm-sys/FastChat[1] which can emulate an OpenAI API using local models.

0: https://github.com/go-skynet/LocalAI 1: https://github.com/lm-sys/FastChat/

Edit: idk if any of this support function calling tho

OpenLLM | Jun 2023

Cool stuff! How does this compare with Fastchat, which seems like another open source project that helps run LLM models?

At a glance, it seems like it's going for lots of similar goals (run LLMs with interoperable APIs):

https://github.com/lm-sys/FastChat

I’m an ER doctor. Here’s how I’m already using ChatGPT to help treat patients | Jun 2023

Expand Context ↕

https://github.com/lm-sys/FastChat

The hardest part is downloading the 30GB of weights.

How to run Llama 13B with a 6GB graphics card | May 2023

These days I use FastChat: https://github.com/lm-sys/FastChat

It’s not based on llama.cpp but huggingface transformers but can also run on CPU.

It works well, can be distributed and very conveniently provide the same REST API than OpenAI GPT.

Ask HN: I won't have Internet access for months, How could I use my time? | May 2023

Install a local LLM (e.g. Vicuna https://github.com/lm-sys/FastChat) to have an offline alternative for stackoverflow (and GPT4).

OpenLLaMA: An Open Reproduction of LLaMA | May 2023

Expand Context ↕

I second this recommendation to start with llama.cpp. It can run on a regular laptop and it gives a sense of what's possible.

If you want access to a serious GPU or TPU, then the sensible solution is to rent one in the cloud. If you just want to run smaller versions of these models, you can achieve impressive results at home on consumer grade gaming hardware.

The FastChat framework supports the Vicuna LLM, along with several others: https://github.com/lm-sys/FastChat

The Oobabooga web interface aims to become the standard interface for chat models: https://github.com/oobabooga/text-generation-webui

I don't see any indication that OpenLLaMa will run on either of those without modification. But one of those, or some other framework may emerge as a de-facto standard for running these models.

Show HN: Promptr, let GPT operate on your codebase and other useful goodies | Apr 2023

I use Vicuna[0]. It's much better than GPT4All.

Vicuna is based on 13B (not 7B) and its training data includes humans chatting with GPT-4 vs GPT4All's purely synthetic dataset generated by GPT-3.5.

[0] https://github.com/lm-sys/FastChat