self-hosted?

"You are in control of your data. Leon lives on your server"

Speech-to-Text: Google Cloud, IBM Watson, Coqui STT, Alibaba Cloud (coming soon), Microsoft Azure (coming soon)

So the AI assistant lives on my server, but if I want to have good quality speech recognition, everything I say is sent through a US cloud service. The only offline option, Coqui has a 7.5% word error rate [1] on LibriSpeech test clean, which is worse than Mozilla Deepspeech 2 from 2016 [2]. State of the art would be around 1.4% [3], meaning 81% less errors than Coqui.

[1] https://coqui.ai/blog/stt/deepspeech-0-6-speech-to-text-engi... [2] https://paperswithcode.com/paper/deep-speech-2-end-to-end-sp... [3] https://paperswithcode.com/paper/pushing-the-limits-of-semi-...

They might be interested in integrating Vosk, it's a speech-to-text engine that is just a shared library (.so file on Linux) and comes with API support for a variety of languages:

https://alphacephei.com/vosk/

https://github.com/alphacep/vosk-api

Still, I've found that the Big players have much better recognition models, and the post-processing that I assume they do (grammatical, maybe syntactical inferences that improve the end result) are probably much more powerful too.