Is this really better than eleven labs?

Just listening to it, it's subjectively not better, but if it's > 10x faster/cheaper, I would use it anyway -- it's good enough to be listenable.

Eleven Labs is the first voice synthesis that is good enough that I'd listen to an audiobook generated from it, but pricing is such that it would cost $100 to synthesize a 10 hour audiobook. A little too expensive. If they could get it down to $10 I'd cancel my Audible subscription and just synthesize audio from ebook text.

So if I can get a locally running voicebox model and just leave it running on my laptop over night transcribing an audiobook, that's even better.

Have you tried tortoisetts? I believe eleven labs basically forked that and made improvements on voice quality and speed there

How does it compare to Voicebox in quality?

I would say that properly configured Tortoise is better, but that comes with the massive caveat that Tortoise:

1 - Is a real pain to get 'working right' - it's not even remotely batteries included

and, more importantly:

2 - Is incredibly slow. I've been turning Heart Of Darkness into an audiobook as a unit test and it takes ~30m per paragraph, on average. Add to that the occasional hiccup where a block gets transcribed badly (Tortoise occasionally 'drops out' of it's selected voice) and Tortoise only really works if you have a ton of compute and you still don't mind waiting forever.

FYI there’s also this fork for faster inference: https://github.com/152334H/tortoise-tts-fast