Neat, https://github.com/openai/whisper - they have open-sourced it, even the model weights, so they are living up to their name in this instance.

The 4 examples are stunningly good (the examples have speakers with heavy accents, speaking in foreign language, speaking with dynamic background noise, etc.), this is far and away better than anything else I've seen. Will be super curious to see other folks trying it out and seeing if it's as robust as it seems, including when confronted with audio speech with natural tics and uhhh's and uhmm's and everything in-between.

I think it's fair to say that AI-transcription accuracy is now decidedly superior to the average human's, what the implications of this are I'm not sure.