Slightly OT:

I have been playing around with whisper.cpp; it's nice because I can run the large model (quantized to 8-bits) at roughly real-time with cublas on a Ryzen 2700 with a 1050Ti. I couldn't even run the pytorch whisper medium on this card with X11 also running.

It blows me away that I can get real-time speech-to-text of this quality on a machine that is almost 5 years old.