For those lolking for something not hosted by a megacorp, check out Mozilla's Text to Speech: https://github.com/mozilla/TTS/blob/master/README.md

Audio Samples: https://soundcloud.com/user-565970875

If your interested in helping improve this, spread the wors about the Common Voice project by Mozilla, a public speech corpus that is easy to help improve, which makes high caliber TTS and transcription possible ouside of walled gardens: https://voice.mozilla.org

Mozilla DeepSpeech also does Speech to Text surprisingly well: https://github.com/mozilla/DeepSpeech

Worth noting that a big chunk of the core TTS code here is built on tools from other researchers like Ryuichi Yamamoto and Keith Ito, and they have great implementations to check out as well.

The best quality I have heard in OSS is probably [1] from Ryuichi using the Tacotron 2 implementation of Rayhane Mamah, which is loosely what NVidia based some of their baseline code on recently as well [3][4].

There's also a colab notebook for this stuff, so you can try it directly without any pain https://colab.research.google.com/github/r9y9/Colaboratory/b...

I also have my own pipeline for this (using some utilities from the above authors + a lot of my own hacks), for a forthcoming paper release here https://github.com/kastnerkyle/representation_mixing/tree/ma... , see the minimal demo. It has pretty fast sampling, but the audio quality is not as high as WaveNet. I'd really like to tie in with WaveGlow, but it's work in progress for me so far.

NOTE: None of these have voice adaptivity per se, but given a model which trains well already + a multispeaker dataset with IDs such as VCTK, a lot of things become possible as getting a baseline model and data pipeline for TTS is quite difficult.

[0] https://github.com/keithito/tacotron

[1] https://r9y9.github.io/blog/2018/05/20/tacotron2/

[2] https://github.com/Rayhane-mamah/Tacotron-2

[3] https://github.com/NVIDIA/waveglow

[4] https://github.com/NVIDIA/tacotron2