Very interesting! Is the music an intentional blended track or an artifact of generation?

very much intentional.

Background music makes misuse/abuse less likely (both intentional and unintentional)

Read more here about in our open discussion: https://github.com/coqui-ai/TTS/discussions/1036

You can probably run the output through Spleeter[1] and get rid of the background music very easily. Just throw more AI at the problem...

It's very hard to curb intentional misuse.

[1] https://github.com/deezer/spleeter