Loosely related, but what's the state of the art in natural text to speech ML/AI models?

Tortoise-TTS (stylised as "TorToiSe") is pretty amazing: https://github.com/neonbjb/tortoise-tts