What does HackerNews think of dragonfly?

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx

Language: Python

#79 in Python
As someone who suffered some severe mobility impairment a few years ago and relied extensively on eye tracking for just over a year, https://precisiongazemouse.org/ (Windows) and https://talonvoice.com/ (multiplatform) are great. In my experience the hardware is already surprisingly good, in that you get accuracy to within an inch or half an inch depending on your training. Rather, it's all about the UX wrapped around it, as a few other comments have raised.

IMO Talon wins* for that by supporting voice recognition and mouth noises (think lip popping), which are less fatiguing than one-eye blinks for common actions like clicking. The creator is active here sometimes.

(* An alternative is to roll your own sort of thing with https://github.com/dictation-toolbox/dragonfly and other tools as I did, but it's a lot more effort)

I've experimented with whisper. I don't know of a way to do commands without parsing dictation. Bottom line, the model has to pass 30 seconds of audio to my knowledge. So say if you're utterance is 5 seconds, you'll need 25 seconds of silence.

Depending on the platform you're targeting.

https://github.com/dictation-toolbox/dragonfly Might be interesting to you.

I have been coding entirely by voice for approximately 10 years now (by hand long before that). Most of that time I have been using the Dragonfly (https://github.com/dictation-toolbox/dragonfly) library to construct my own customized voice coding system. The library is highly flexible and open source, allowing you to easily customize everything to suit what you need to be productive. It is perhaps the power user analogue to Dragon Naturally Speaking. With it, you can certainly be highly productive coding by voice. However, it does require work to setup and customize to suit you, so it isn't really for the "general population" of computer users to just sit down and use. With regard to accuracy of speech recognition, being open allows you to (with sufficient motivation) to train a custom acoustic speech model that recognizes your voice specifically extremely well.

Regarding the software packages you referenced: Yes, Dragon is trash that I want nothing to do with, because of its inefficient interface, its complete inability to accurately understand my voice, and its generally shoddy software quality. Voice Computer (which I hadn't seen before) is therefore eliminated as well, though it doesn't look terrible as a front end to Dragon to better use the OS GUI-accessibility info. Many people like Talon, but I demand something open, which I can modify to suit my needs.

Background: I develop kaldi-active-grammar (https://github.com/daanzu/kaldi-active-grammar), a free and open source speech recognition backend usable by Dragonfly, itself entirely by voice. There's also a community of voice coders using Dragonfly and other tools that build on top of it, such as Caster (https://github.com/dictation-toolbox/Caster).

I have been coding entirely by voice for approximately 10 years now (by hand long before that). Most of that time I have been using the Dragonfly (https://github.com/dictation-toolbox/dragonfly) library to construct my own customized voice coding system. The library is highly flexible and open source, allowing you to easily customize everything to suit what you need to be productive. It is perhaps the power user analogue to Dragon Naturally Speaking. With it, you can certainly be highly productive coding by voice. In fact, I develop kaldi-active-grammar (https://github.com/daanzu/kaldi-active-grammar), a free and open source speech recognition backend usable by Dragonfly, itself entirely by voice. There's also a community of voice coders using Dragonfly and other tools that build on top of it, such as Caster (https://github.com/dictation-toolbox/Caster).
I am sorry to hear this. I think there are many people in a similar boat to you and there are quite a few people working on command & dictation computing. Although my tool _may_ help you find out which speech systems work well for your voice/accent/mic/vocab it might also be worth trying another one of the specialist libraries specifically for dictation and controlling computers.

I've not heard of Almond, but I have seen the following projects which might be helpful:

- Dragonfly: https://github.com/dictation-toolbox/dragonfly

- Demo: https://www.youtube.com/watch?v=Qk1mGbIJx3s / Software: https://github.com/daanzu/kaldi-active-grammar

Far field audio is usually harder for any speech system to get correct, so having a good quality mic and using it nearby will _usually_ help with the transcription quality. As a long time Linux user, I would love to see it get some more powerful voice tools - really hope that this opens up over the next few years. Feel free to drop me an email (on my profile) happy to help with setup on any of the above.

I think my best option is to write Python scripts with Dragonfly [0] to make a Visual Studio or VS Code extension that gives me accessibility, and/or a neovim plugin to let me say vim commands efficiently.

Dictating plaintext and copy/pasting works for writing code, but navigating through VS menus and code files and e.g. running unit tests is still a nightmare. Maybe an accessible mouse would help.

It sounds niche, but Visual Studio is perhaps the most popular IDE, and Dragon is the only real option for voice access on PCs. Any programmer without use of her hands would need this. It suggests that none of the programmers on the Dragon team have really dogfooded their product, at least for accessibility.

What I find truly heinous though is the Chrome plugin. It has two stars and thousands of reviews, and either doesn't work at all or breaks minutes in. When it works it's great, but it almost never does.

0. https://github.com/dictation-toolbox/dragonfly

Talon actually uses Mac's built in STT if you don't have Dragon.

James from http://handsfreecoding.org/ was working on a fork of Dragonfly[0] to add support for Google's speech recognition, but I'm not sure if he still is. There are several barriers to that working well though: additional latency really hurts, API usage costs and (as far as I know) an inability to specify a command grammar (Dragonfly/Vocola/Talon all let you use EBNF-like notation to define commands, which are preferentially recognized over free-form dictation).

[0]: https://github.com/dictation-toolbox/dragonfly