> This all changed yesterday, thanks to the combination of Facebook’s LLaMA model and llama.cpp by Georgi Gerganov.

George Hotz was so confident that he was riding the wave with his Python implementation: https://github.com/geohot/tinygrad/blob/master/examples/llam.... But I guess not, pure C++ seems better.

Isn't it more the four bit quantization than the choice of C++ as an orchestrator that's the win? It's not as if in either the C++ or the Python case that high level code is actually doing the matrix multiplications.

That basically the whole AI revolution is powered by CPython of all things (not even PyPy) is the 100 megaton nuke that should end language warring forever.

That the first AGI will likely be running under a VM so inefficient that it refcounts even integers is God laughing in the face of all the people who've spent the past decades arguing that this language or that language is "faster". Amdahl was right: only inner loops matter.

> That basically the whole AI revolution is powered by CPython of all things (not even PyPy) is the 100 megaton nuke that should end language warring forever.

And a lot of new AI tooling such as tokenization has been developed for Python using Rust (pyo3)

The original llama uses google's tooling for that, written in C++ https://github.com/google/sentencepiece