What does HackerNews think of spleeter?
Deezer source separation library including pretrained models.
I have tried many sources and method over the years and settled on spleeter [0]. Works well even for 10+ minute songs, varying styles from flamenco to heavy metal.
you can find the stems of many songs online and some artists often offer remix packs (which you can easily obtain by contacting their label/publisher)
you can also try splitting them yourselves with some ML: https://github.com/deezer/spleeter
It's very hard to curb intentional misuse.
- PDM: A modern Python package manager with PEP 582 support[1]
- Spleeter: Deezer source separation library including pretrained models[2]
---
https://github.com/facebookresearch/demucs
https://github.com/sigsep/open-unmix-pytorch
Yes, right now it only changes the icon for a tape, even though we actually track and save it at a per-second level. Obviously there is quite a bit room for more use of this in the future, but the icon is a start :-). Also notice, by the way, that the waveform is black for music and grey for speech - particularly handy in a jam session. You can see exactly when a take started and when it ended.
So called "blind source separation" (i.e. get multitracks from just one mic) is possible today, but with fairly audible artifacts. The most popular library in use today is Spleeter [1], which is based on Andreas Jansson et al.'s work at Spotify [2]. There are newer algorithms in academia, a good overview is provided at [3]. If you want to do something today, iZotope's RX is very good, and a great example that demonstrates how good old DSP engineering can dramatically reduce the unwanted artifacts even in new ML-based approaches.
Then again, my real question would be: why would you want a multitrack recording from your jam session? Is it to be able to further adjust the mix afterwards? Do you need the individual instrument tracks for practicing? Would be really interested in hearing your use case :-).
[1] https://github.com/deezer/spleeter
[2] https://scholar.google.com/citations?view_op=view_citation&h...
1. For sure, I was thinking something along the lines of a multiview VAE that gets as input either `f(z|audio, midi)` or `f(z|dx7_parameters)` and must produce as output `f(audio|midi,z)` or `f(dx7_parameters|z)`
2. Yea, I have tried to pick apart Ableton files in the past but the format is a bit of a nightmare, it might be easier to use source separation like https://github.com/deezer/spleeter to build your dataset!
Search your query in YouTube using https://github.com/youkaclub/youka-youtube
Search lyrics using https://github.com/youkaclub/youka-lyrics
Split the vocals from instruments using https://github.com/deezer/spleeter
Align text to voice (the hardest part) using some private api
spleeter project: https://github.com/deezer/spleeter