What does HackerNews think of whisper.cpp?

Port of OpenAI's Whisper model in C/C++

Language: C

Amazon Bedrock Is Now Generally Available | Sep 2023

https://github.com/ggerganov/whisper.cpp

I had fun with both of these. They will both do realtime transcription. Bit you will have to download the training data sets…

MacWhisper: Transcribe audio files on your Mac | Aug 2023

Expand Context ↕

It runs locally, using Whisper.cpp[1], a Whisper implementation optimized to run on CPU, especially Apple Silicon.

Whisper itself is open source, and so is that implementation, the OpenAI endpoint is merely a convenience to those who don't wish to host a Whisper server themselves, deal with batching, renting GPUs etc. If you're making a commercial service based on Whisper, the API might be worth it for the convenience, but if you're running it personally and have a good enough machine (an M1 MacBook Air will do), running it locally is usually better.

[1] https://github.com/ggerganov/whisper.cpp

Ask HN: How to transcribe Apple Watch recordings? | Jul 2023

You can use Whisper to transcribe the audio to text locally on the mac.

You have a great Open-source implementation named whisper.cpp and a few graphical user interfaces for it:

https://github.com/ggerganov/whisper.cpp

https://sindresorhus.com/aiko

https://goodsnooze.gumroad.com/l/macwhisper

Personally I use MacWhisper pro because it’s very convenient.

Amazon cancels my account after exposing account lockout for “racist doorbell” [video] | Jun 2023

Expand Context ↕

whisper.ai is apparently something completely different, I’m pretty sure OP meant OpenAI’s whisper [0] which is mainly used with whisper.cpp [1] I think.

[0]: https://github.com/openai/whisper

[1]: https://github.com/ggerganov/whisper.cpp

Llama.cpp: Full CUDA GPU Acceleration | Jun 2023

Expand Context ↕

Have you tried the Quick start in the https://github.com/ggerganov/whisper.cpp README?

Show HN: Ermine.ai – Record and transcribe speech, 100% client-side (WASM) | Apr 2023

Expand Context ↕

Thanks!

Not sure if it'll work on an Arduino, but maybe take a look at https://github.com/ggerganov/whisper.cpp -- it works on a Raspberry Pi at least, so resource requirements are fairly minimal

Launch HN: Vocode (YC W23) – Library for voice conversation with LLMs | Mar 2023

This looks awesome. My only nitpick is, I will suggest transcription integration with whisper.cpp[1], which in my simple CPU based tests (likely your most user base), works much much faster compared to OpenAI whisper

[1] https://github.com/ggerganov/whisper.cpp

What we know about the Apple Neural Engine | Mar 2023

Expand Context ↕

This is close: https://github.com/ggerganov/whisper.cpp

Llama.cpp: Port of Facebook's LLaMA model in C/C++, with Apple Silicon support | Mar 2023

Expand Context ↕

> Some people have already had success porting Whisper to the Neural Engine, and as of 14 hours ago GGerganov (the guy who made this port of LLaMA to the Neural Engine and who made the port of Whisper to C++) posted a GitHub comment indicating he will be working on that in the next few weeks.

He has already done great work here: https://github.com/ggerganov/whisper.cpp

Llama.cpp: Port of Facebook's LLaMA model in C/C++, with Apple Silicon support | Mar 2023

I'm a huge fan of Georgi (the author)! You should also check out his other work, bringing Apple Silicon support to OpenAI's Whisper (speech-to-text model): https://github.com/ggerganov/whisper.cpp

LLaMA-7B in Pure C++ with full Apple Silicon support | Mar 2023

Super cool project. This is from the author of whisper.cpp, which enables highly accurate real-time audio transcription on the M1/M2:

https://github.com/ggerganov/whisper.cpp

Writeout.ai – Transcribe and translate any audio files. Free and open source | Mar 2023

Expand Context ↕

https://github.com/ggerganov/whisper.cpp makes it relatively feasible to run on CPU.

Writeout.ai – Transcribe and translate any audio files. Free and open source | Mar 2023

Just download whisper ....

If you own a gpu use this one https://github.com/openai/whisper

If you don't own a gpu use this one https://github.com/ggerganov/whisper.cpp (this one is very very slow)

ChatGPT-Linux-Assistant | Mar 2023

Now use Whisper.cpp to talk to Jarvis

https://github.com/ggerganov/whisper.cpp

Introducing ChatGPT and Whisper APIs | Mar 2023

Expand Context ↕

I've ran Whisper locally via [1] with one of the medium sized models and it was damn good at transcribing audio from a video of two people having a conversation.

I don't know exactly what the use case is where people would need to run this via API; the compute isn't huge, I used CPU only (an M1) and the memory requirements aren't much.

[1] https://github.com/ggerganov/whisper.cpp

Introducing ChatGPT and Whisper APIs | Mar 2023

I recently tried a number of options for streaming STT. Because my use case was very sensitive to latency, I ultimately went with https://deepgram.com/ - but https://github.com/ggerganov/whisper.cpp provided a great stepping stone while prototyping a streaming use case locally on a laptop.

Introducing ChatGPT and Whisper APIs | Mar 2023

You can run Whisper in WASM (locally) so no need to pay for the API, plus the bandwidth. It actually works surprisingly well: https://github.com/ggerganov/whisper.cpp

James Somers on a Whisper-driven Future | Feb 2023

https://github.com/ggerganov/whisper.cpp

Show HN: Self-host Whisper As a Service with GUI and queueing | Feb 2023

https://github.com/ggerganov/whisper.cpp has diarization

NanoGPT | Jan 2023

Expand Context ↕

While doing my PhD some years ago (it wasn't a PhD on AI, but very much related) I trained several models with the usual stack back then (pytorch and some others in TF). I realized that a lot of this stack could be rewritten in much simpler terms without sacrificing much fidelity and/or performance in the end.

Submissions like yours and other projects like this one (recently featured here as well) -> https://github.com/ggerganov/whisper.cpp, makes it pretty clear to me that this intuition is correct.

There's a couple tools I created back then that could push things further towards this direction, unfortunately they're not mature enough to warrant a release but the ideas they portray are worth taking a look at (IMHO) and I'll be happy to share them. If there's interest on your side (or anyone reading this thread) I'd love to talk more about it.

Microsoft in talks to acquire a 49% stake in ChatGPT owner OpenAI | Jan 2023

Expand Context ↕

> vs the serverside systems

I believe this runs client side, but whether it counts as open source is likely open for debate:

https://github.com/ggerganov/whisper.cpp

Shhh bot – A free telegram bot for Speech to Text on a PI 4 with Whisper.cpp | Dec 2022

I have a pet hate, it's voice notes in WhatsApp or Telegram. Quite often the voice notes remain unheard for hours, due to the call to action (the notification) not letting me see what I need to react to, or if I'm in meetings and cannot listen for a period of time.

There are paid for services which can transcode speech to text but none free I could find. With the release of Whisper this has become something I thought could be solved with some minimal coding.

While Whisper relies on GPU's, Whisper.cpp does not and can run on a CPU with 1Gb ram (about 500mb for the model) enter the Pi 4.

I wrote a telegram bot in Python using python-telegram-bot which calls whisper.cpp to transcode speech to text. Here's my bot which is open to all, but you could start your own, with a Pi 4 and an always up connection, you can leave it running for when you need it.

Due to the constraints on the Pi 4, it only runs the English model and may result in errors for other languages.

Check my bot out here https://web.telegram.org/k/#@shhhhhhhhhhhhhhhhh_bot Check out Whipser here https://openai.com/blog/whisper/

Check out Whipser.cpp here https://github.com/ggerganov/whisper.cpp

Live Captions: an application that provides live captions for the Linux desktop | Dec 2022

Expand Context ↕

There are various size options for the models. The choice of which you use trades off accuracy for higher performance.

There is also a C++ re-implementation that performs well and can definitely transcribe in realtime on many machines: https://github.com/ggerganov/whisper.cpp

Ask HN: How to get back into AI? | Dec 2022

Read all the leading papers, many times, to get a deep understanding, the writing quality is usually pretty low, but the information density can be very high, you'll probably miss the important details the first time.

Most medium and low-quality papers are full of errors and noise, but you can still learn from them.

Get your hands dirty with real code.

I would take a look at those:

https://github.com/geohot/tinygrad

https://github.com/ggerganov/whisper.cpp

OpenAI quietly launched Whisper V2 in a GitHub commit | Dec 2022

Expand Context ↕

You probably want to check this port : https://github.com/ggerganov/whisper.cpp

OpenAI quietly launched Whisper V2 in a GitHub commit | Dec 2022

Expand Context ↕

whisper.cpp (https://github.com/ggerganov/whisper.cpp) supports streaming!

Stable Diffusion with Core ML on Apple Silicon | Dec 2022

It's one of the reasons I recently ported the Whisper model to plain C/C++. You just clone the repo, run `make [model]` and you are ready to go. No Python, no frameworks, no packages - plain and simple.

https://github.com/ggerganov/whisper.cpp

Show HN: I made a free transcription service powered by Whisper AI | Nov 2022

Expand Context ↕

Not a hack per se but a complete reimplementation.

https://github.com/ggerganov/whisper.cpp

This is a C/C++ version of Whisper which uses the CPU. It's astoundingly fast. Maybe it won't work in your use case, but you should try!

Show HN: I made a free transcription service powered by Whisper AI | Nov 2022

This is a cool project. I’ve been very happy with whisper as an alternative to otter; it works better and solves real problems for me.

I feel compelled to point out whisper.cpp. It may be cheaper for the author but is relevant for others.

I was running whisper on a gtx 1070 to get decent performance; it was terribly slow on M1 Mac. Whisper.cpp has comparable performance to the 1070 while running on M1 CPU. It is easy to build and run and well documented.

https://github.com/ggerganov/whisper.cpp

I hope this doesn’t come off the wrong way, I love this project and I’m glad to see the technology democratized. Easily accessible high-quality transcription will be a game changer for many people and organizations.

Rewind: The Search Engine for Your Life | Nov 2022

Expand Context ↕

You might find my inference implementation of Whisper useful [0]. It has a C-style API that allows for easy integration in other projects and you can control how many CPU threads to be used during the processing.

[0] https://github.com/ggerganov/whisper.cpp

Show HN: Record voice memo, receive transcription in email | Oct 2022

Expand Context ↕

On M1 Pro, with Greedy decoder and medium model, I can transcribe 1 hour audio in just 10 minutes (~x6 real-time) [0].

[0] https://github.com/ggerganov/whisper.cpp

How to really know another person | Oct 2022

Expand Context ↕

You can try my C/C++ port of Whisper:

https://github.com/ggerganov/whisper.cpp

No dependencies, no Python, runs efficiently on the CPU.