What does HackerNews think of tortoise-tts?

A multi-voice TTS system trained with an emphasis on quality

Language: Python

Nice to see this - also, looks like one of the primary authors is jbetker, who built the best open source TTS model that I've seen yet (TorToiSe):

https://nonint.com/2023/09/23/dall-e-3/

https://github.com/neonbjb/tortoise-tts

It was primarily being used to train TTS models (see https://github.com/neonbjb/tortoise-tts), which largely fit into a single GPUs memory. So, for data parallelism, x8 PCIe isn't that much of a concern.
https://github.com/neonbjb/tortoise-tts

Give this a try. You can run it locally if you have a good enough GPU. It is pretty slow at generation tho

The bottleneck is currently TTS. The best option is probably Eleven Labs, but response times are unpredictable. GPT response times can be worked around by falling back to a faster model, but you can't do that with TTS because the voice needs to be consistent. It seems like current state of the art are diffusion models ala DALL-E, see e.g. [1] (the developer, James Betker now incidentally works for OpenAI). It's nontrivial to turn this into something that works in real-time without a decent budget, though.

Whisper (for transcription) is insanely fast and good.

1. https://github.com/neonbjb/tortoise-tts

TorToiSe[0] is pretty good but I agree 11 is currently state of the art. Won't be long until GP is correct though. 1.5 years at best is my guess. The next moat will be multiple languages and maybe something like more control over the tone which is something perhaps more suited to a product.

[0]https://github.com/neonbjb/tortoise-tts

Nice work! Has anybody compared this to TorToiSe [1] ?

[1] https://github.com/neonbjb/tortoise-tts

If you build a product no matter what you have to be honest to yourself and imho most of the neural voices from azure sound better than your example. They may miss some of the tempre of your voices but the tempre comes from the examples you fed it... tbh it's not much better than doing it yourself with something like https://github.com/neonbjb/tortoise-tts
This blog is by the author of Tortoise-TTS (stylised as TorToiSe): https://github.com/neonbjb/tortoise-tts

Very cool work being done in ML these days, both inside large corps, and also by independent researchers/"hobbyists" (quotation marks because there is some really expert work being produced by them).

And from yesterday, an open-source version of Google's Imagen by lucidrains: https://github.com/lucidrains/imagen-pytorch

Some cutting-edge stuff is still being made by talented hackers using nothing but a rig of 8x 3090s: https://github.com/neonbjb/tortoise-tts

Other funding models are possible as well, in the grand scheme of things the price for these models is small enough.

See also the recently published Tortoise TTS, which IMO sounds even better: https://github.com/neonbjb/tortoise-tts
TortoiseTTS might be the closest https://github.com/neonbjb/tortoise-tts It's a few shot multi speaker model so you need just 3-4 little clips to train new voices.