EDIT: Actually there's apparently been a lot of progress recently that I hadn't kept up with; see the replies to this comment.

Original message: From a quick peek at the source, this depends on the ChatGPT API for the underlying LLM. It could probably be modified to use a local copy of an LLM, but most models I've seen are 300GB+ and require significant computational resources to operate (think several $15k NVIDIA A100 compute nodes). There's a lot of effort being put in by the open source community to minimize these models and run them on commodity hardware, but as of yet the quality of the responses from the model are correlated with how large (and therefore how much compute) the model has. Give it a year or two and it'll probably be more reasonable to integrate a local LLM for gaming purposes.

> most models I've seen are 300GB+ and require significant computational resources to operate (think several $15k NVIDIA A100 compute nodes).

What? Where have you been the last 3 months?

> the quality of the responses from the model are correlated with how large (and therefore how much compute) the model has

There's a lot more to this including the model structure, training methods, number of training tokens, quality of training data, etc.

I'm not at all saying that Vicuna/Alpaca/SuperCOT/Other llama based models are as good as GPT3.5 - but they should be capable of this, they still create coherent answers.

You need preferably 24GB of vram, but you can get away with less, or you can use system memory (although that'll be slow).

There is a openai api proxy that might let this work without too much work actually

EDIT: It actually says in the readme they plan to support StableLM which is interesting because at least at the moment that's not a well performing model

EDIT 2: You should try the replit2.8B model - This is surprisingly good at programming - https://huggingface.co/spaces/replit/replit-code-v1-3b-demo

> EDIT: It actually says in the readme they plan to support StableLM which is interesting because at least at the moment that's not a well performing model

I chose StableLM because that's the only other model I knew of besides ChatGPT. I'm open to adding support for other models after I fix some bugs first.

You might consider supporting ooba's api which would give you a lot of support for different things really quickly.

https://github.com/oobabooga/text-generation-webui/