This is awesome.

For anyone confused about the project, it is using whisper.cpp, a C-based runner and translation of the open whisper model from OpenAI. It is built by the team behind GGML and llama.cpp. https://github.com/ggerganov

You can fork this code, run it on your own server, and hit the API. The server itself will use FFmpeg to convert the audio file into the required format and run the C translation of the whisper model against the file.

By doing this you can separate yourself from the requirement of paying the fee that OpenAI charges for their Whisper service and fully own your translations. The models that the author has supplied here are rather small but should run decent on a CPU. If you want to go to larger model sizes you would likely need to change the compilation options and use a server with a GPU.

Similar to this project, my product https://superwhisper.com is using these whisper.cpp models to provide really good Dictation on macOS.

Its runs really fast on the M series chips. Most of this message was dictated using superwhisper.

Congrats to the author of this project. Seems like a useful implementation of the whisper.cpp project.

I wonder if they would accept it upstream in the examples.

One caveat here is that whisper.cpp does not offer any CUDA support at all, acceleration is only available for Apple Silicon.

If you have Nvidia hardware the ctranslate2 based faster-whisper is very very fast: https://github.com/guillaumekln/faster-whisper