The current paradigm is that AI is a destination. A product you go to and interact with.

That's not at all how the masses are going to interact with AI in the near future. It's going to be seamlessly integrated into every-day software. In Office/Google docs, at the operating system level (Android), in your graphics editor (Adobe), on major web platforms: search, image search, Youtube, the like.

Since Google and other Big Tech continue to control these billion-user platforms, they have AI reach, even if they are temporarily behind in capability. They'll also find a way to integrate this in a way where you don't have to directly pay for the capability, as it's paid in other ways: ads.

OpenAI faces the existential risk, not Google. They'll catch up and will have the reach/subsidy advantage.

And it doesn't end there. This so-called "competition" from open source is going to be free labor. Any winning idea ported into Google's products on short notice. Thanks open source!

I think the problem with AI being everywhere and ubiquitous is that AI is the first technology in a very long time that requires non-trivial compute power. That compute power costs money. This is why you only get a limited number of messages every few hours from GPT4. It simply costs too much to be a ubiquitous technology.

For example, the biggest LLama model only runs on an A100 that costs about $15,000 on ebay. The new H100 that is 3x faster goes for about $40,000 and both of these cards can only support a limited number of users, not the tens of thousands of users who can run off a high-end webserver.

I'd imagine Google would lose a lot of money if they put GPT4 level AI into every search, and they are obsessed with cost per search. Multiply that by the billions and it's the kind of thing that will not be cheap enough to be ad supported.

The biggest llama model has near 100% fidelity (its like 99.3%) at 4 bit quantization, which allows it to fit on any 40GB or 48GB GPU, which you can get for $3500.

Or at about a 10x speed reduction you can run it on 128 GB of RAM for only around $250.

The story is not anywhere near as bleak as you paint.

I haven't seen any repos or guides to using llama on that level of RAM, which is something I do have. any pointers?

Run text-generation-webui with llama.cpp: https://github.com/oobabooga/text-generation-webui