Having enough scale to perpetually offer free/low-cost compute is a moat. The primary reason ChatGPT went viral in the first place was because it was free, with no restrictions. Back in 2019, GPT-2 1.5B was made freely accessible by a single developer via the TalkToTransformers website, which was the very first time many people talking about AI text generation...then the owner got hit with sticker shock from the GPU compute needed to scale.

AI text generation competitors like Cohere and Anthropic will never be able to compete with Microsoft/Google/Amazon on marginal cost.

Charity is only a moat if it’s not profitable.

This is the timeline that's scaring the shit out of them:

Feb 24, 2023: Meta launches LLaMA, a relatively small, open-source AI model.

March 3, 2023: LLaMA is leaked to the public, spurring rapid innovation.

March 12, 2023: Artem Andreenko runs LLaMA on a Raspberry Pi, inspiring minification efforts.

March 13, 2023: Stanford's Alpaca adds instruction tuning to LLaMA, enabling low-budget fine-tuning.

March 18, 2023: Georgi Gerganov's 4-bit quantization enables LLaMA to run on a MacBook CPU.

March 19, 2023: Vicuna, a 13B model, achieves "parity" with Bard at a $300 training cost.

March 25, 2023: Nomic introduces GPT4All, an ecosystem gathering models like Vicuna at a $100 training cost.

March 28, 2023: Cerebras trains an open-source GPT-3 architecture, making the community independent of LLaMA.

March 28, 2023: LLaMA-Adapter achieves SOTA multimodal ScienceQA with 1.2M learnable parameters.

April 3, 2023: Berkeley's Koala dialogue model rivals ChatGPT in user preference at a $100 training cost.

April 15, 2023: Open Assistant releases an open-source RLHF model and dataset, making alignment more accessible.

This really ought to mention https://github.com/oobabooga/text-generation-webui, which was the first popular UI for LLaMA, and remains one for anyone who runs it on GPU. It is also where GPTQ 4-bit quantization was first enabled in a LLaMA-based chatbot; llama.cpp picked it up later.