I just tested the model [1] using an RTX3090, trying to translate a french text I found here [2].

Some observations:

- The full translation of the 6:22 minute video takes about 22 seconds (17x real time)

- It recognizes the language by default (and did a good job to recognize it was french audio)

- MIT License [3]!

- The quality of the transcription is good, but not perfect.

- The quality of the translation (if you don't consider transcription errors as a translation error) is generally very good.

---

The transcription:

> Bonjour à tous, j'suis espère que vous allez bien, c''est ENTI. Et aujourd', aujourd', on se retrouve un peu physique pour parler de la termo dynamique. Vous ne vous inquiétez pas, ça va bien se passer. On va y aller ensemble, être à par exemple, je vous accompagne à travers une série de vidéos pour vous expliquer les principes de base en termo dynamique. Et bah, c''est parti, on va y aller tranquillement. Lidée, c''est vous puissiez comprendre la termo dynamique dans son ensemble. Donc, je vais vraiment prendre mon temps pour couplisser bien comprendre les notions,

The translation:

> Hello everyone, I hope you're doing well, it's NT and today we find ourselves a little physical to talk about the thermo dynamic. Don't worry, it's going well, we're going to go together and be the same. I'm going to accompany you through a series of videos to explain the basic principles in thermo dynamic. Well, let's go, we're going to go quietly. The idea is that you can understand the thermo dynamic in sound together. So I'm really going to take my time to understand the notions,

---

All in all very happy that OpenAI is publishing their models. If Stable Diffusion is any guide, people will hack some crazy things with this.

[1] https://github.com/openai/whisper [2] https://www.youtube.com/watch?v=OFLt-KL0K7Y [3] https://github.com/openai/whisper/blob/main/LICENSE