Remove "and Python 3.11" from title. Python used only for converting model to llama.cpp project format, 3.10 or whatever is fine.

Additionally, llama.cpp works fine with 10 y.o hardware that supports AVX2.

I'm running llama.cpp right now on an ancient Intel i5 2013 MacBook with only 2 cores and 8 GB RAM - 7B 4bit model loads in 8 seconds to 4.2 GB RAM and gives 600 ms per token.

btw: anyone knows how to disable swap per process in macOS ? even though there is enough free RAM, sometimes macOS decides to use swap instead.

Can you provide a link to what guide or steps you followed to get this up and running? I have a physical Linux machine with 300+ GB of RAM, would love to try out llama on it but I'm not sure where to get started for how to get it working with such a configuration.

Edit: Thank you, @diimdeep!

Sure. You can get models with magnet link from here https://github.com/shawwn/llama-dl/

To get running, just follow these steps https://github.com/ggerganov/llama.cpp/#usage