LLM's are incredibly expensive to run.

I imagine the huge demand that ChatGPT is seeing would make any cloud vendor sweat if you were to suddenly lump it on top of the usual demand.

To me it's entirely unsurprising that OpenAI would have trouble keeping up. Good luck to them.

I think from Apple’s POV, this is great news. Moore’s Law has been dead for years, and there has been no good reason to upgrade your devices until now. AI means it’s 1990 again, and you need to buy a new device every 18 months because the performance leap is so meaningful to the UX.

I'd agree if Apple were in the business of making datacentre infrastructure.

Nobody is running these large scale models on their personal devices.

Sure, some of the image generation tech is seeing personal use, so you'd have a point there, but these immense language models are something else entirely.

GLM-130B[1] (a 130 billion parameter model vs GPT-3's 175 billion parameter model) is able to run optimally on consumer level high-end hardware, 4xRTX 3090 in particular. That's < $4k at current prices, and as hardware prices go one can only imagine what it'll be in a year or two. It also enables running with degraded performance on lesser systems.

It's a whole lot cheaper to run neural net style systems than to train them. "Somebody on Twitter"[2] got it setup, and broke down the costs, demonstrated some prompts, and what not. Cliff notes being a fraction of a penny per query, with each taking about 16s to generate. The output's pretty terrible, but it's unclear to me whether that's inherent or a result of priority. I expect OpenAI spent a lot of manpower on supervised training, whereas this system probably had minimal, especially in English (it's from a Chinese university).

[1] - https://github.com/THUDM/GLM-130B

[2] - https://twitter.com/alexjc/status/1617152800571416577