Very interesting! Is the music an intentional blended track or an artifact of generation?
very much intentional.
Background music makes misuse/abuse less likely (both intentional and unintentional)
Read more here about in our open discussion: https://github.com/coqui-ai/TTS/discussions/1036
You can probably run the output through Spleeter[1] and get rid of the background music very easily. Just throw more AI at the problem...
It's very hard to curb intentional misuse.