What does HackerNews think of larynx?

End to end text to speech system using gruut and onnx

Language: Python

The most exciting thing about Home Assistant's "Year of the Voice", for me, is that it is apparently enabling/supporting @synesthesiam's continued phenomenal contributions to the FLOSS off-line voice synthesis space.

The quality, variety & diversity of voices that synesthesiam's "Larynx" TTS project (https://github.com/rhasspy/larynx/) made available, completely transformed the Free/Open Source Text To Speech landscape.

In addition "OpenTTS" (https://github.com/synesthesiam/opentts) provided a common API for interacting with multiple FLOSS TTS projects which showed great promise for actually enabling "standing on the shoulders of" rather than re-inventing the same basic functionality every time.

The new "Piper" TTS project mentioned in the article is the apparent successor to Larynx and, along with the accompanying LibriTTS/LibriVox-based voice models, brings to FLOSS TTS something it's never had before:

* Too many voices! :)

Seriously, the current LibriTTS voice model version has 900+ voices (of varying quality levels), how do you even navigate that many?![0]

And that's not even considering the even higher quality single speaker models based on other audio recording sources.

Offline TTS while immensely valuable for individuals, doesn't seem to be attractive domain for most commercial entities due to lack of lock-in/telemetry opportunities so I was concerned that we might end up missing out on further valuable contributions from synesthesiam's specialised skills & experience due to financial realities & the human need for food. :)

I'm glad we instead get to see what happens next.

[0] See my follow-up comment about this.

If you've not already encountered them I'd definitely encourage you to check out these Free/Open Source projects too:

* Larynx: https://github.com/rhasspy/larynx/

* OpenTTS: https://github.com/synesthesiam/opentts

* Likely Mimic3 in the near future: https://mycroft.ai/blog/mimic-3-preview/

Larynx in particular has a focus on "faster than real-time" while OpenTTS is an attempt to package & provide common REST API to all Free/Open Source Text To Speech systems so the FLOSS ecosystem can build on previous work supported by short-lived business interests, rather than start from scratch every time.

AIUI the developer of the first two projects now works for Mycroft AI & is involved in the development of Mimic3 which seems very promising given how much of an impact on quality his solo work has had in just the past couple of years or so.

I imagine that our concept of what a villain sounds like tends to be extremely personally biased but here's a couple of options [Advisory: Contains threatening language.]:

* http://www.sndup.net/p33q

* http://www.sndup.net/sppn

I created these samples in a relatively short time using the Free/Open Source (which I think is an important factor for indies) text-to-speech project Larynx & an narrative editor I finally released the other weekend:

* https://github.com/rhasspy/larynx/

* https://rancidbacon.itch.io/dialogue-tool-for-larynx-text-to...

Now, I would really like to link you directly to audio of the next two but considering it's currently in beta behind an (automated response) email address, I think that may not be appropriate, so, instead...

* Visit & get access to the beta here: https://mycroft.ai/blog/mimic-3-preview/

* Copy & paste this SSML into the form: https://pastebin.com/Bwd7LCbj

It's definitely a noticeable step up again in quality.

There's an alternate pair of voices if you move the "_" from one "name" attribute to the other in each "voice" element.

I intentionally didn't edit the text to remove some of the artifacts both to give a realistic impression of the current state & because sometimes they add interesting texture. :)

Note the beta voices are "low" quality.

On a Tangent:

As I switched from macOS to arch linux, I was looking for a good text-to-speech option, cross platform and open-source. (I use it to get an first overview over larger, boring papers I'm reviewing ).

pretty impressed by larynx so far: https://github.com/rhasspy/larynx

It seems also quite hackable.

Will also play with NATSpeech.

Yes!

The project is called Larynx, and it is amazing: https://github.com/rhasspy/larynx/

I waxed lyrical about it recently in this thread about private alternatives to Alexa: https://news.ycombinator.com/item?id=29562526

I can only vouch for the quality/variety in English but it does note support for 50 voices over 9 languages, including all the first group of languages you mentioned, and also Russian. (I've "played" with all those languages to test them but can't really vouch for how a native speaker/listener might find it. :D )

It is miles ahead of any of the other Free/Open Source TTS solutions I've tried, including the ones you mentioned.

(It's still synthesized speech but the output quality is so good and the project is still extremely early days.)

And there's a range of options in accent & gender--which are in general sorely lacking in other FLOSS TTS options. (In terms of licensing, some voices are licensed more freely than others but the majority are without significant restriction.)

I like Larynx so much that I've been working on an editor for it to assist in "auditioning" & recording speech in a narrative context, e.g. game/film pre-viz.