I hope AI voice tech like https://beta.elevenlabs.io/ will make audiobooks even better.

i am waiting for the day where i can use my own computer to generate high quality reading of any text so that i can listen to any book i like without waiting for an audiobook to be published.

sandreas

It's already possible, I'm working on the exact same thing.

Take a look at https://www.thorsten-voice.de/en/motivation-vision-english/. He is a german guy who is donating his voice to the community and provides a LJSpeech dataset (https://www.thorsten-voice.de/en/datasets-2/) as well as lot's of information how to use it (the quality is not as good as the commercial ones, it really is usable).

While his voice is not that bad, I would prefer my favorite german narrators to read my audiobooks, so I'm working on creating custom datasets from existing audiobooks that I own. These of course cannot be published because it would violate terms, but you can use ffmpeg to split an existing audio book by silence and transcribe it automatically using existing tools to create your own LJSpeech dataset.

There are also LJSpeech datasets for other languages - the format is pretty simple and can be used by anyone to train AI models.

The result is pretty impressive for "offline only".

em-bee

that is interesting! can you describe how that transcription process works? if i have an audiobook and the corresponding ebook/pdf, isn't that already transcribed? or does transcription here mean something else?

i'd also be happy to use an existing voice (the english one from keith ito sounds pleasant enough) but i am confused how to use it to read a book. there is code for a model that learns to synthesize speech from the data: https://github.com/keithito/tacotron but i don't see how to get at the end result which i think would hopefully also be available somewhere so i can just use it to read something.