Does someone know how the llama.cpp was implemented? Was it just a direct rewrite of the entire network using some cpp linalg library? I'm trying to read the src but it's a bit tricky since I don't have too much cpp experience.

Georgi rewrote the code on top of his own tensor library (ggml[0]).

[0] https://github.com/ggerganov/ggml