What does HackerNews think of spleeter?

Deezer source separation library including pretrained models.

Language: Python

#10 in Python
#1 in Tensorflow
I tried to use it but I had some issues as others in the thread.

I have tried many sources and method over the years and settled on spleeter [0]. Works well even for 10+ minute songs, varying styles from flamenco to heavy metal.

[0] https://github.com/deezer/spleeter

It's not too hard these days with open source BPM detection and stem separation libraries: https://github.com/deezer/spleeter
There are a few AI-based tools for spliting music into stems. An open source one I know is https://github.com/deezer/spleeter
Rick’s explanation is he has some friends at music labels who give him the original stems (isolated tracks) for educational purposes

you can find the stems of many songs online and some artists often offer remix packs (which you can easily obtain by contacting their label/publisher)

you can also try splitting them yourselves with some ML: https://github.com/deezer/spleeter

You can probably run the output through Spleeter[1] and get rid of the background music very easily. Just throw more AI at the problem...

It's very hard to curb intentional misuse.

[1] https://github.com/deezer/spleeter

Here's a few that haven't been mentioned yet:

- PDM: A modern Python package manager with PEP 582 support[1]

- Spleeter: Deezer source separation library including pretrained models[2]

---

[0]: https://github.com/pdm-project/pdm

[1]: https://github.com/deezer/spleeter

Yep, you're asking a lot of the AI gods, but then again if you don't have big dreams, how is big change ever going to happen? :D

Yes, right now it only changes the icon for a tape, even though we actually track and save it at a per-second level. Obviously there is quite a bit room for more use of this in the future, but the icon is a start :-). Also notice, by the way, that the waveform is black for music and grey for speech - particularly handy in a jam session. You can see exactly when a take started and when it ended.

So called "blind source separation" (i.e. get multitracks from just one mic) is possible today, but with fairly audible artifacts. The most popular library in use today is Spleeter [1], which is based on Andreas Jansson et al.'s work at Spotify [2]. There are newer algorithms in academia, a good overview is provided at [3]. If you want to do something today, iZotope's RX is very good, and a great example that demonstrates how good old DSP engineering can dramatically reduce the unwanted artifacts even in new ML-based approaches.

Then again, my real question would be: why would you want a multitrack recording from your jam session? Is it to be able to further adjust the mix afterwards? Do you need the individual instrument tracks for practicing? Would be really interested in hearing your use case :-).

[1] https://github.com/deezer/spleeter

[2] https://scholar.google.com/citations?view_op=view_citation&h...

[3] https://sigsep.github.io

There is software like Spleeter [1] that can split songs into stems (seperate drums, vocals, rhythm, etc.) using ML. I imagine it would be possible to adapt it for this. Might even work as-is if you only needed to isolate people's voices.

1. https://github.com/deezer/spleeter

No the exact thing you're after, but you may be interested in Spleeter[0] which can pick out individual instruments from a given track with surprising accuracy. Deezer uses it to aid in song recognition. In this case, you could theoretically pick out the whole track and leave everything else or something.

[0] https://github.com/deezer/spleeter

This is powered by Spleeter, an open source project from deezer, but repackaged with a UI: https://github.com/deezer/spleeter
Thanks for the feedback!

1. For sure, I was thinking something along the lines of a multiview VAE that gets as input either `f(z|audio, midi)` or `f(z|dx7_parameters)` and must produce as output `f(audio|midi,z)` or `f(dx7_parameters|z)`

2. Yea, I have tried to pick apart Ableton files in the past but the format is a bit of a nightmare, it might be easier to use source separation like https://github.com/deezer/spleeter to build your dataset!

I'll add some explanation soon. Here's the main process:

Search your query in YouTube using https://github.com/youkaclub/youka-youtube

Search lyrics using https://github.com/youkaclub/youka-lyrics

Split the vocals from instruments using https://github.com/deezer/spleeter

Align text to voice (the hardest part) using some private api