Is there a good non-internet connected device for turning on lights with your voice?
I built something just for fun a couple of weeks back. I used the Sphinx Open Source Speech Recognition Toolkit from CMU https://cmusphinx.github.io/ Look for the sphinxbase and pocketsphinx packages under Ubuntu.
It'd be simple enough to hook this up to Philips Hue or whatever to do what you want.
There are numerous Sphinx language bindings. I went for Ruby via Isabella https://github.com/chrisvfritz/isabella. I used this because it provided a framework for what I wanted: define a simple grammar (JSGF, Java Speech Grammar Format) and call specified script(s) with the parsed results. The hardest part was probably mapping out the phonemes for the grammar atoms. (If your target language isn't English, you may be out of luck).
This worked really well for what I needed (directing band-in-a-box from the other side of the room) but still needs a little tuning. Even with a leading activation token ("Hey Isabella...") she sometimes gets confused and thinks she's been summoned when it's just some random song playing. Choosing a concise, simple, unambiguous grammar was helpful, along with sensitivity adjustment. There are other knobs to twiddle -- as evidenced by the academic mailing list activity -- but I didn't need to look closer for my simple use case.
It was a fun little project and the kids liked it, especially paired with a text-to-speech module (tts gem under Ruby): "Hey Isabella, am I ?" (or "Is ?"), and a randomly generated response :)
I ended up using a set of JSGF grammars (one per intent) to generate a statistical language model for use with pocketsphinx. Rhasspy also features a web-based interface for creating custom words -- I have a mapping from Sphinx phonemes to eSpeak phonemes so you can iterate over a pronunciation until it sounds right.
As you mentioned, the wake/hotword stuff with Sphinx isn't terribly robust. I've been Docker-izing Mycroft Precise (https://github.com/MycroftAI/mycroft-precise) to address this.