Voice assistants are basically just mainstream non-visual command-lines, and it's unsurprising to me that something that relies heavily on memorization and extremely specialized "skills" isn't quite taking off in the way it was imagined. A voice system that can do literally everything one can do with a keyboard and a mouse would be magical, but no system offers that.

Instead, it's a guessing game about syntax and semantics, and frequently a source of frustration. There are many failure points: it can "hear" you wrong, it can miss the wake word, it can hear correctly but interpret wrong, miss context clues, or simply be unable to process whatever the request is. In my experience, most normal people either relegate voice commands to ultra-specific tasks, like timers, weather, and music, and that's that. Google and Alexa are relatively good at "trivia" questions, but Siri is a complete failure. All systems have edge cases that make them brittle.

I think there's potential here. Cortana was the most promising: an assistant that's integrated into the OS and can change any setting or perform anything on-screen would, again, be really awesome. We just don't have that. I think maybe OS-wide + GPT 4 (or later) might get closer to what we expect, but it's just not great right now. I really want to be able to say something as unstructured as "hey siri, create alarms every 5 minutes starting at 6am tomorrow" or "hey siri, when I get home every day, turn on all of the lights, change my focus to personal, and turn on the news". There /is/ power to-be-had, but nobody has really tapped it.

I tried Amazon's Alexa, the top end model with a display. Often it would taunt you about new/interesting things on the screen, but I could never get them to work. I'd had to memorize things to get even the basics working. Ended up unplugging it.

However Google's Assistant in comparison worked great, no memorization, and very useful. Sure time, weather, set timers, and alarms worked great with a very flexible set of natural language queries. Even more complex things like what will be the temperature tomorrow at 10pm, simple calculations and unit conversions. But also things like IMDB like queries about directors, actors, which movies someone was in, etc generally worked well. It seemed to really understand things, not just "A web search returned ...". Even more complex things like the wheelbase of a 2004 WRX would return an answer, not a search result.

With all that said I'm looking for a non-cloud/on site solution, even if it requires more work, most recently noticed https://github.com/rhasspy/rhasspy