What does HackerNews think of whisper?
Robust Speech Recognition via Large-Scale Weak Supervision
"Adding --task translate will translate the speech into English."
I tried translating a conference with a German speaker. The transcription was superb, but the translation no so much.
I picked a random passage from a novel in French I am currently reading. ChatGPT translated the three paragraphs I ran it on correctly; there are no major quibbles to be had. It is good, coherent English, a correct translation, which closely follows the French original, even capturing some of the poetic imagery effectively.
I'm sure after another paragraph or two there will be a weird screw-up. And there's no consistency in a running translation of any length. Etc. Yes, it's not perfect. Not fully human-equivalent.
Still. I remember when machine translation like I just did was the realm of science fiction. And I thought it would remain science fiction for a long time. The fact that such a thing isn't mind-blowing shows how far things have come, hasn't it?
> Speech recognition - Siri still have major issues undestanding me.
I am using speech-to-text AI transcription every day. It's been revolutionary for me. I am hard of hearing. The cutting edge is Whisper, and it is leaps and bounds over the state-of-the-art just a year ago: https://github.com/openai/whisper
My understanding is that the only reason OpenAI even setup the paid API is because it "can also be hard to run [sic]". Personally, I'm skeptical. I"m not knocking them for it but I could see how this is just brand capitalization.
[0]: https://openai.com/blog/introducing-chatgpt-and-whisper-apis...
For this model specifically (https://github.com/openai/whisper) it would be a significant challenge for a newcomer. Luckily Huggingface has a blog post that will get you started: https://huggingface.co/blog/fine-tune-whisper
From the OpenAI paper and release notes: "We are releasing models and inference code to serve as a foundation for further work on robust speech processing." So I guess they are either truly altruistic in this, or they are planning on monetising whatever they build on top of it.
Also OpenAI is a startup (if we can call it that) so their value right now is more about being impressive, and looking a lot like future value; as opposed to showing an immediate route to profit.
How is https://github.com/openai/whisper not open source?
When I do need to recall, I just use the strategy of "hold the items in my head as intently as possible, for as little time as possible until I'm able to get to my phone or some paper".
On the rare occasions when neither my phone or paper are immediately available, I try to visualize the ideas, projecting them onto a mental canvas, and try to use connections between them or mnemonics to remember them as best as I can.
I avoid the problem of "I sit down in front of my computer in order to write down or do a thing" by building the discipline necessary to prevent myself from getting distracted in that manner.
It does speech-to-text, then you can use the full force of all the text analysis tools that are out there.
[0] https://github.com/openai/whisper [1] https://github.com/mozilla/TTS
Depending on how you define "work", Whisper also works with Hebrew. Not sure if the word error rate is acceptable though https://github.com/openai/whisper/#available-models-and-lang...
They also release Whisper model and code[2]
In all fairness, their release of Whisper[0] last week is actually really amazing. Like CLIP, it has the ability to spawn a lot of further research and work thanks to the open source aspect of it. I hope OpenAI learns from this, downgrades the "safety" shills, and focuses on producing more high-quality open source work, both code and models, which will move the field forward.
Perhaps it will encourage people to add voice command to their apps, which can be sent to gpt3