I'm very interested in this space. Forgive my ignorance, but what makes this fit for Chinese voices, while unfit for English voices?

From README:

> This repository is forked from Real-Time-Voice-Cloning which only support English.

https://github.com/CorentinJ/Real-Time-Voice-Cloning