Pytorch (+GPU) dependency and python container type diversity are particularly bad. Programmers may not perceive this since they're already managing their python environment keeping all the OS/libs/containers/applications in the alignment required for things to work but it's quite complex. I couldn't do it.

In comparison I could just type git clone https://github.com/ggerganov/llama.cpp and make . And it worked. And since then I've managed to get llama.cpp clBLAS partial GPU acceleration working with my AMD RX 580 8GB. Plus with the llama.cpp CPU mmap stuff I can run multiple LLM IRC bot processes using the same model all sharing the RAM representation for free. Are there even ways to run 2 or 3 bit models in pytorch implementations like llama.cpp can do? It's pretty rad I could run a 65B llama in 27 GB of RAM on my 32GB RAM system (and still get better perplexity than 30B 8 bit).

> In comparison I could just type git clone https://github.com/ggerganov/llama.cpp and make . And it worked.

You're comparing a single, well managed project that had put effort into user onboarding against all projects of a different language and proclaiming that an entire language/ecosystem is crap.

The only real take away is that many projects, independent of language, put way too little effort towards onboarding users.