What does HackerNews think of whisper?

Robust Speech Recognition via Large-Scale Weak Supervision

Language: Jupyter Notebook

MacWhisper: Transcribe audio files on your Mac | Aug 2023

Is this just a front end to OpanAI's whisper?

https://github.com/openai/whisper

MacWhisper: Transcribe audio files on your Mac | Aug 2023

Expand Context ↕

The OpenAi CLI does that, follow the instructions https://github.com/openai/whisper

Notes on Vision Pro | Jun 2023

Expand Context ↕

The current voice transcription engine at OpenAI is using Whisper-1[0] which is open source and runnable locally, if you wanted to keep it all on-device. I run it locally for various things and it works pretty damn well.

[0] https://github.com/openai/whisper

Ask HN: Which project(s) made you go “I can't believe this is open-source”? | Jun 2023

OpenAI Whisper

https://github.com/openai/whisper

Tone: Cross platform audio tagger and metadata editor | May 2023

Check out Whisper, it is surprisingly good and several of the output formats include time codes, works with multiple languages.

https://github.com/openai/whisper

Meta AI announces Massive Multilingual Speech code, models for 1000+ languages | May 2023

Expand Context ↕

Is there any indication that USM will be open sourced though? This is more so competing with Whisper.

https://github.com/openai/whisper

Whisper.cpp v1.4.0 | Apr 2023

Expand Context ↕

Yes. According to the OpenAI Whisper repo [0], when you use the 'whisper' command line tool:

"Adding --task translate will translate the speech into English."

I tried translating a conference with a German speaker. The transcription was superb, but the translation no so much.

[0] https://github.com/openai/whisper

Ask HN: Was tech always so scammy? | Mar 2023

Expand Context ↕

> Machine translation is hilarious at best and dangerously wrong at worst.

I picked a random passage from a novel in French I am currently reading. ChatGPT translated the three paragraphs I ran it on correctly; there are no major quibbles to be had. It is good, coherent English, a correct translation, which closely follows the French original, even capturing some of the poetic imagery effectively.

I'm sure after another paragraph or two there will be a weird screw-up. And there's no consistency in a running translation of any length. Etc. Yes, it's not perfect. Not fully human-equivalent.

Still. I remember when machine translation like I just did was the realm of science fiction. And I thought it would remain science fiction for a long time. The fact that such a thing isn't mind-blowing shows how far things have come, hasn't it?

> Speech recognition - Siri still have major issues undestanding me.

I am using speech-to-text AI transcription every day. It's been revolutionary for me. I am hard of hearing. The cutting edge is Whisper, and it is leaps and bounds over the state-of-the-art just a year ago: https://github.com/openai/whisper

How 'Open' Is OpenAI, Really? | Mar 2023

Expand Context ↕

Whisper. Speech recognition and translation. Probably the best general-purpose speech recognition available right now. Certainly the best you can run on your own hardware. Open-sourced code and publicly free and available models. https://github.com/openai/whisper

Writeout.ai – Transcribe and translate any audio files | Mar 2023

Expand Context ↕

https://github.com/openai/whisper/

Writeout.ai – Transcribe and translate any audio files. Free and open source | Mar 2023

Is there any chance you could expose a pathway to use a local instance of Whisper? I ask primarily because OpenAI completely open-sourced Whisper in September 2022[0]. It seems odd to me to default to or encourage the usage of a paid service for something that appears to be available for free under MIT license including models[1].

My understanding is that the only reason OpenAI even setup the paid API is because it "can also be hard to run [sic]". Personally, I'm skeptical. I"m not knocking them for it but I could see how this is just brand capitalization.

[0]: https://openai.com/blog/introducing-chatgpt-and-whisper-apis...

[1]: https://github.com/openai/whisper

Show HN: Self-host Whisper As a Service with GUI and queueing | Feb 2023

Expand Context ↕

Yes, you need to fine-tune the model with your data. This might be easy or hard, depending on your experience level and complexity of the model and available tooling.

For this model specifically (https://github.com/openai/whisper) it would be a significant challenge for a newcomer. Luckily Huggingface has a blog post that will get you started: https://huggingface.co/blog/fine-tune-whisper

Show HN: PodText.ai – Search anything said on a podcast, highlight text to play | Feb 2023

I thought for Whisper (https://github.com/openai/whisper) you actually do not need GPU's and can use a CPU?

OpenAI and Microsoft extend partnership | Jan 2023

Expand Context ↕

> being called OpenAI while nothing is open

https://github.com/openai/whisper is open

Whisper.cpp example running fully in the browser | Jan 2023

Expand Context ↕

They are pre-trained. This project is running a port of the original open AI release [0] to C++.

From the OpenAI paper and release notes: "We are releasing models and inference code to serve as a foundation for further work on robust speech processing." So I guess they are either truly altruistic in this, or they are planning on monetising whatever they build on top of it.

Also OpenAI is a startup (if we can call it that) so their value right now is more about being impressive, and looking a lot like future value; as opposed to showing an immediate route to profit.

[0] https://github.com/openai/whisper

Microsoft in talks to acquire a 49% stake in ChatGPT owner OpenAI | Jan 2023

Expand Context ↕

> what they have is open source

How is https://github.com/openai/whisper not open source?

Ask HN: What are you working on this year? | Jan 2023

Expand Context ↕

Whisper is a Speech to Text model released by OpenAI: https://github.com/openai/whisper

Ask HN: Are there any good open source text-to-speech tools? | Jan 2023

OpenAI’s whisper[1] should do the job for you.

[1] - https://github.com/openai/whisper

Ask HN: How do you recall ideas/eureka you get during walks, showers, etc.? | Dec 2022

I avoid needing to rely on my human memory (recall) when possible - if my phone is around, I make voice notes on it that are automatically synced to desktop which are then transcribed using Whisper[1].

When I do need to recall, I just use the strategy of "hold the items in my head as intently as possible, for as little time as possible until I'm able to get to my phone or some paper".

On the rare occasions when neither my phone or paper are immediately available, I try to visualize the ideas, projecting them onto a mental canvas, and try to use connections between them or mnemonics to remember them as best as I can.

I avoid the problem of "I sit down in front of my computer in order to write down or do a thing" by building the discipline necessary to prevent myself from getting distracted in that manner.

[1] https://github.com/openai/whisper

Ask HN: I have 3 years of family dinner audio recordings; what to do with it? | Dec 2022

Expand Context ↕

Convenient link:

https://github.com/openai/whisper

I found a secret US Government surveillance program (2019) | Dec 2022

Expand Context ↕

Isn't this it? https://github.com/openai/whisper

Serverless Video Transcription inspired by Cyberpunk 2077 | Dec 2022

Expand Context ↕

The magic is still there! In this case, the model for OpenAI's Whisper, which is arguably doing the bulk of the work here, is Open Source (under the MIT licence), and freely available for download at https://github.com/openai/whisper. You can run it wherever you want, though something with a GPU will let you do 5x realtime (or better!) transcription.

Amazon Alexa is a “colossal failure,” on pace to lose $10B this year | Nov 2022

Expand Context ↕

At this point in the SOTA, Whisper [0], can probably be a drop in replacement for the for-profit Nuance versions of Dragon, etc. It's even open source.

[0] https://github.com/openai/whisper

I record myself on audio 24x7 and use an AI to process the information | Nov 2022

Expand Context ↕

I would guess that they're using OpenAI's Whisper, which is open source: https://github.com/openai/whisper

It does speech-to-text, then you can use the full force of all the text analysis tools that are out there.

Show HN: I record myself on audio 24x7 and use an AI to process the information | Nov 2022

Expand Context ↕

You can download whisper at https://github.com/openai/whisper

Show HN: I record myself on audio 24x7 and use an AI to process the information | Nov 2022

I've been thinking about wiring up whisper[0], mozilla's tts[1] and gpt-3 together to make a voice assistant of sorts. Wouldn't have the access to device hardware and no guarantees of correct answers, but should blow siri etc out of the water in terms of understanding the context.

[0] https://github.com/openai/whisper [1] https://github.com/mozilla/TTS

Thanks to the Israeli accessibility law, I have to delete my websites | Oct 2022

Expand Context ↕

I mentioned Whisper because it works with a lot of languages. But I understand your confusion, because there are additional lightweight models that are only available for English. Its accuracy is less good for Hebrew, but instructional materials are likely optimal input.

https://github.com/openai/whisper

Thanks to the Israeli accessibility law, I have to delete my websites | Oct 2022

> And it only works with English.

Depending on how you define "work", Whisper also works with Hebrew. Not sure if the word error rate is acceptable though https://github.com/openai/whisper/#available-models-and-lang...

3D Novel View Synthesis with Diffusion Models | Oct 2022

Expand Context ↕

They released CLIP (both model and code[1]), which is very broadly used in Dall-E alternatives. For example Stable Diffusion uses it.

They also release Whisper model and code[2]

[1] https://github.com/openai/CLIP

[2] https://github.com/openai/whisper

DALL·E Now Available Without Waitlist | Sep 2022

Expand Context ↕

Truly proves the saying, "Get Woke, Go Broke". All this pearl-clutching over safety really did a disservice to them.

In all fairness, their release of Whisper[0] last week is actually really amazing. Like CLIP, it has the ability to spawn a lot of further research and work thanks to the open source aspect of it. I hope OpenAI learns from this, downgrades the "safety" shills, and focuses on producing more high-quality open source work, both code and models, which will move the field forward.

[0]: https://github.com/openai/whisper

Whisper – open source speech recognition by OpenAI | Sep 2022

Expand Context ↕

> Neat, https://github.com/openai/whisper - they have open-sourced it, even the model weights, so they are living up to their name in this instance.

Perhaps it will encourage people to add voice command to their apps, which can be sent to gpt3