https://github.com/Const-me/Whisper
I had fun with both of these. They will both do realtime transcription. Bit you will have to download the training data sets…
Whisper itself is open source, and so is that implementation, the OpenAI endpoint is merely a convenience to those who don't wish to host a Whisper server themselves, deal with batching, renting GPUs etc. If you're making a commercial service based on Whisper, the API might be worth it for the convenience, but if you're running it personally and have a good enough machine (an M1 MacBook Air will do), running it locally is usually better.
You have a great Open-source implementation named whisper.cpp and a few graphical user interfaces for it:
https://github.com/ggerganov/whisper.cpp
https://goodsnooze.gumroad.com/l/macwhisper
Personally I use MacWhisper pro because it’s very convenient.
Not sure if it'll work on an Arduino, but maybe take a look at https://github.com/ggerganov/whisper.cpp -- it works on a Raspberry Pi at least, so resource requirements are fairly minimal
He has already done great work here: https://github.com/ggerganov/whisper.cpp
If you own a gpu use this one https://github.com/openai/whisper
If you don't own a gpu use this one https://github.com/ggerganov/whisper.cpp (this one is very very slow)
I don't know exactly what the use case is where people would need to run this via API; the compute isn't huge, I used CPU only (an M1) and the memory requirements aren't much.
Submissions like yours and other projects like this one (recently featured here as well) -> https://github.com/ggerganov/whisper.cpp, makes it pretty clear to me that this intuition is correct.
There's a couple tools I created back then that could push things further towards this direction, unfortunately they're not mature enough to warrant a release but the ideas they portray are worth taking a look at (IMHO) and I'll be happy to share them. If there's interest on your side (or anyone reading this thread) I'd love to talk more about it.
I believe this runs client side, but whether it counts as open source is likely open for debate:
There are paid for services which can transcode speech to text but none free I could find. With the release of Whisper this has become something I thought could be solved with some minimal coding.
While Whisper relies on GPU's, Whisper.cpp does not and can run on a CPU with 1Gb ram (about 500mb for the model) enter the Pi 4.
I wrote a telegram bot in Python using python-telegram-bot which calls whisper.cpp to transcode speech to text. Here's my bot which is open to all, but you could start your own, with a Pi 4 and an always up connection, you can leave it running for when you need it.
Due to the constraints on the Pi 4, it only runs the English model and may result in errors for other languages.
Check my bot out here https://web.telegram.org/k/#@shhhhhhhhhhhhhhhhh_bot Check out Whipser here https://openai.com/blog/whisper/
Check out Whipser.cpp here https://github.com/ggerganov/whisper.cpp
There is also a C++ re-implementation that performs well and can definitely transcribe in realtime on many machines: https://github.com/ggerganov/whisper.cpp
Most medium and low-quality papers are full of errors and noise, but you can still learn from them.
Get your hands dirty with real code.
I would take a look at those:
https://github.com/ggerganov/whisper.cpp
This is a C/C++ version of Whisper which uses the CPU. It's astoundingly fast. Maybe it won't work in your use case, but you should try!
I feel compelled to point out whisper.cpp. It may be cheaper for the author but is relevant for others.
I was running whisper on a gtx 1070 to get decent performance; it was terribly slow on M1 Mac. Whisper.cpp has comparable performance to the 1070 while running on M1 CPU. It is easy to build and run and well documented.
https://github.com/ggerganov/whisper.cpp
I hope this doesn’t come off the wrong way, I love this project and I’m glad to see the technology democratized. Easily accessible high-quality transcription will be a game changer for many people and organizations.
https://github.com/ggerganov/whisper.cpp
No dependencies, no Python, runs efficiently on the CPU.