The culprit seems to be:

> "Chollet thinks he knows what’s going on: summer vacation... a significant portion of students using ChatGPT to do their homework. It’s one of the most common uses for ChatGPT, according to Sam Gilbert, a data scientist and author."

https://finance.yahoo.com/news/chatgpt-suddenly-isn-t-boomin...

Not a student, stopped using ChatGPT because OpenAI is unreliable. Moved on to other models and providers

I went self hosted.

It was about time to build a new desktop anyways (roughly 4 to 6 years before the old one goes to frolic at the server farm in the basement) and $2,000 will easily buy a machine that can run the quantized 65b models right now. So I spent slightly more than I normally do on this latest box and it's happily spitting out 10+ tokens a second.

You're not going to beat GPT-4 yet, but you have direct control over where your info goes, what model you're running, compliance with work policies against using public AI, and relatively cheap fixed costs.

Not to mention, the local version works with no internet and isn't subject to provider outages (not entirely true - but you're the provider and can resolve).

Seems like an easy win for anyone who might be buying a desktop for graphic/gaming anyways.

Super interesting! Can you point me to some of the models and repos you used to do this?

For base tooling, things like:

https://huggingface.co/ (finding models and downloading them)

https://github.com/ggerganov/llama.cpp (llama)

https://github.com/cmp-nct/ggllm.cpp (falcon)

For interactive work (art/chat/research/playing around), things like:

https://github.com/oobabooga/text-generation-webui/blob/main... (llama) (Also - they just added a decent chat server built into llama.cpp the project)

https://github.com/invoke-ai/InvokeAI (stable-diffusion)

Plus a bunch of hacked together scripts.

Some example models (I'm linking to quantized versions that someone else has made, but the tooling is in the above repos to create them from the published fp16 models)

https://huggingface.co/TheBloke/llama-65B-GGML

https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ

https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored...

etc. Hugging face has quite a number, although some require filling out forms for the base models for tuning/training.