Nice work! Congrats on hitting the app store. I’m excited to use this for our next hours-long jam session.
How does the instrument detection work? Does it do anything else besides change the icon for a tape?
It would be so cool to just show up at a jam session, hit record on my iphone, and get all that multitrack goodness without having to mic everything separately. But I realize that’s asking a lot from the AI gods :-).
Yes, right now it only changes the icon for a tape, even though we actually track and save it at a per-second level. Obviously there is quite a bit room for more use of this in the future, but the icon is a start :-). Also notice, by the way, that the waveform is black for music and grey for speech - particularly handy in a jam session. You can see exactly when a take started and when it ended.
So called "blind source separation" (i.e. get multitracks from just one mic) is possible today, but with fairly audible artifacts. The most popular library in use today is Spleeter [1], which is based on Andreas Jansson et al.'s work at Spotify [2]. There are newer algorithms in academia, a good overview is provided at [3]. If you want to do something today, iZotope's RX is very good, and a great example that demonstrates how good old DSP engineering can dramatically reduce the unwanted artifacts even in new ML-based approaches.
Then again, my real question would be: why would you want a multitrack recording from your jam session? Is it to be able to further adjust the mix afterwards? Do you need the individual instrument tracks for practicing? Would be really interested in hearing your use case :-).
[1] https://github.com/deezer/spleeter
[2] https://scholar.google.com/citations?view_op=view_citation&h...