But right now what incentive have I to buy a new laptop? I got this 16GB M1 MBA two years ago and it's literally everything I need, always feels fast, silent etc

1. the idea would be that now there is a reason to buy loads more RAM, whereas currently the market for 64GB is pretty niche

2. 64GB is a big laptop today, in a few years time that will be small. And LLaMA 65B int4 quantized should fit comfortably

4. LLMs will be a commodity. There will be a free one

6. LLMs seem to avoid the need for finetuning by virtue of their size - what we see now with the largest models is you just do prompt engineering. Making use of personal data is a case of Langchain + vectorstores (or however the future of that approach pans out)

1. You're working backwards from a desire to buy more RAM to try and find uses for it. You don't actually need more RAM to use LLMs, ChatGPT requires no local memory, is instant and is available for free today.

2. Why would anybody be satisfied with a 64GB model when GPT-4 or 5 or 6 might even be using 1TB of RAM?

3. That may not be the case. With every day that passes, it becomes more and more clear that large LLMs are not that easy to build. Even Google has failed to make something competitive with OpenAI. It's possible that OpenAI is in fact the new Google, that they have been able to establish permanent competitive advantage, and there will no more be free commodity LLMs than there are free commodity search engines.

Don't get me wrong, I would love there to be high quality local LLMs. I have at least two use cases where you can't do them or not really well with the OpenAI API and being able to run LLama locally would fix that problem. But I just don't see that being a common case and at any rate I would need server hardware to do it properly, not Mac laptop.

1. You're working backwards from a desire to buy more RAM to try and find uses for it.

I'm really not

I had no desire at all until a couple of weeks ago. Even now not so much since it wouldn't be very useful to me

But the current LLM business model where there are a small number of API providers, and anything built using this new tech is forced into a subscription model... I don't see it sustainable, and I think the buzz around llama.cpp is a taste of that

I'm saying imagine a future where it is painless to run a ChatGPT-class LLM on your laptop (sounded crazy a year ago, to me now looks inevitable within few years), then have a look at the kind of things that can be done today with Langchain... then extrapolate

It sounds like we are in a similar position. I had no desire to get a 64gb laptop from apple until all the interesting things from running llama locally came out. I wasn't even aware of the specific benefit of that uniform memory model on the mac. Now I'm looking at do I want to do 64, 96 or 128gb. For an insane amount of money, 5k for that top end one.

The unified memory ought to be great for running LLaMA on the GPU on these Macbooks (since it can't run on the Neural Engine currently)

The point of llama.cpp is most people don't have a GPU with enough RAM, Apple unified memory ought to solve that

Some people have it working apparently:

https://github.com/remixer-dec/llama-mps