What does HackerNews think of omnizart?
Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.
I suppose these models are trained on western / pop music, so they may not work nicely on ethnic music.
[1] https://github.com/Music-and-Culture-Technology-Lab/omnizart [2] https://github.com/magenta/mt3
In order to do this you can use a source/stem separation model like spleeter (https://github.com/deezer/spleeter) and then run the basic pitch model (or any other midi transcription model). There's other you can try which may yield better results, for example: (https://github.com/Music-and-Culture-Technology-Lab/omnizart)
Either way the key words you want to be looking for are "midi transcription" and "stem separation", should help you find more models to try for both steps. Good luck! :)
EDIT: Oh it looks like there's even a stem separation leaderboard on papers with code, neat: https://paperswithcode.com/task/music-source-separation