I ran into some minor glitches trying to install and use DeepSpeech couple of days ago. I’m sure they’ll be fixed soon enough but meanwhile hope this helps: https://www.phpied.com/taking-mozillas-deepspeech-for-a-spin...
It only works on "short", about 5 seconds or so, audio clips. (We should have documented this better, but I just put in a PR adding this to the documentation.)
However, you can use voice activity detection (VAD), for example webrtcvad from PyPI, to chop long audio into smaller bits that are able to be digested.
Maybe we should just put VAD in the client and have this occur automatically?
out of interest, do you also work on a reverse solution, text-to-speech? Most open source engines sadly still can't compete with commercial alternatives.
Maybe Tacotron will interest you? It's an end-to-end model, that's reasonably close to the state of the art:
https://google.github.io/tacotron/publications/tacotron/inde...
They are some open source implementations.
Edit: Another interesting one: http://research.baidu.com/deep-voice-3-2000-speaker-neural-t...