Wow, fun to find this trending on HN this morning! I am currently also working on the associated video lecture (as the next episode of my video lecture series here https://karpathy.ai/zero-to-hero.html ), where I will build nanoGPT from scratch and aspire to spell everything out, as with the earlier videos. Hoping to get it out in ~2 weeks or so.

While doing my PhD some years ago (it wasn't a PhD on AI, but very much related) I trained several models with the usual stack back then (pytorch and some others in TF). I realized that a lot of this stack could be rewritten in much simpler terms without sacrificing much fidelity and/or performance in the end.

Submissions like yours and other projects like this one (recently featured here as well) -> https://github.com/ggerganov/whisper.cpp, makes it pretty clear to me that this intuition is correct.

There's a couple tools I created back then that could push things further towards this direction, unfortunately they're not mature enough to warrant a release but the ideas they portray are worth taking a look at (IMHO) and I'll be happy to share them. If there's interest on your side (or anyone reading this thread) I'd love to talk more about it.