What does HackerNews think of py-webrtcvad?

Python interface to the WebRTC Voice Activity Detector

Language: C

Haven’t tried it yet but love the concept!

Have you thought of using VAD (voice activity detection) for breaks? Back in my day (a long time ago) the webrtc VAD stuff was considered decent:

https://github.com/wiseman/py-webrtcvad

Model isn’t optimized for this use but I like where you’re headed!

As part of ETL or just basic understanding about the how the speech data is handled try this tool : https://github.com/wiseman/py-webrtcvad

It is a python wrapper for a library for voice activity detection. It acts as a starting point while working on speech recognition problems. Helped me understand and discover a lot of concepts related to audio signal and data when I was in your shoes.

Background sounds should not trigger voice. Typing should not trigger voice.

That's right. Voice activity detection (VAD) is not the same as sound detection. WebRTC even has a really good VAD built into it that is extremely easy to use and dynamically adapts to the current audio environment. See e.g. https://github.com/wiseman/py-webrtcvad and https://github.com/dpirch/libfvad for examples where the relatively small VAD code has been pulled out of the giant webrtc corpus.

People also need to know to enable AEC in their audio driver, which completely solves the problem of whatever sounds they're playing leaking into their mic.

Check out this one: https://github.com/wiseman/py-webrtcvad

If this one does not work for your application, perhaps look into simpler ones like the ones used in mobile telephone codecs or in Speex.