What does HackerNews think of py-webrtcvad?

Whisper – open source speech recognition by OpenAI | Sep 2022

Haven’t tried it yet but love the concept!

Have you thought of using VAD (voice activity detection) for breaks? Back in my day (a long time ago) the webrtc VAD stuff was considered decent:

https://github.com/wiseman/py-webrtcvad

Model isn’t optimized for this use but I like where you’re headed!

Ask HN: I want to get started with Speech-to-Text. Where do I begin? | Jan 2022

As part of ETL or just basic understanding about the how the speech data is handled try this tool : https://github.com/wiseman/py-webrtcvad

It is a python wrapper for a library for voice activity detection. It acts as a starting point while working on speech recognition problems. Helped me understand and discover a lot of concepts related to audio signal and data when I was in your shoes.

How Discord Handles Two and Half Million Concurrent Voice Users Using WebRTC | Sep 2018

Expand Context ↕

Background sounds should not trigger voice. Typing should not trigger voice.

That's right. Voice activity detection (VAD) is not the same as sound detection. WebRTC even has a really good VAD built into it that is extremely easy to use and dynamically adapts to the current audio environment. See e.g. https://github.com/wiseman/py-webrtcvad and https://github.com/dpirch/libfvad for examples where the relatively small VAD code has been pulled out of the giant webrtc corpus.

People also need to know to enable AEC in their audio driver, which completely solves the problem of whatever sounds they're playing leaking into their mic.

Speech and Language Processing, 3rd ed. draft (2017) | Jan 2018

Expand Context ↕

Check out this one: https://github.com/wiseman/py-webrtcvad

If this one does not work for your application, perhaps look into simpler ones like the ones used in mobile telephone codecs or in Speex.