self-hosted?
"You are in control of your data. Leon lives on your server"
Speech-to-Text: Google Cloud, IBM Watson, Coqui STT, Alibaba Cloud (coming soon), Microsoft Azure (coming soon)
So the AI assistant lives on my server, but if I want to have good quality speech recognition, everything I say is sent through a US cloud service. The only offline option, Coqui has a 7.5% word error rate [1] on LibriSpeech test clean, which is worse than Mozilla Deepspeech 2 from 2016 [2]. State of the art would be around 1.4% [3], meaning 81% less errors than Coqui.
[1] https://coqui.ai/blog/stt/deepspeech-0-6-speech-to-text-engi... [2] https://paperswithcode.com/paper/deep-speech-2-end-to-end-sp... [3] https://paperswithcode.com/paper/pushing-the-limits-of-semi-...
https://github.com/alphacep/vosk-api
Still, I've found that the Big players have much better recognition models, and the post-processing that I assume they do (grammatical, maybe syntactical inferences that improve the end result) are probably much more powerful too.