Yep, you're asking a lot of the AI gods, but then again if you don't have big dreams, how is big change ever going to happen? :D

Yes, right now it only changes the icon for a tape, even though we actually track and save it at a per-second level. Obviously there is quite a bit room for more use of this in the future, but the icon is a start :-). Also notice, by the way, that the waveform is black for music and grey for speech - particularly handy in a jam session. You can see exactly when a take started and when it ended.

So called "blind source separation" (i.e. get multitracks from just one mic) is possible today, but with fairly audible artifacts. The most popular library in use today is Spleeter [1], which is based on Andreas Jansson et al.'s work at Spotify [2]. There are newer algorithms in academia, a good overview is provided at [3]. If you want to do something today, iZotope's RX is very good, and a great example that demonstrates how good old DSP engineering can dramatically reduce the unwanted artifacts even in new ML-based approaches.

Then again, my real question would be: why would you want a multitrack recording from your jam session? Is it to be able to further adjust the mix afterwards? Do you need the individual instrument tracks for practicing? Would be really interested in hearing your use case :-).

[1] https://github.com/deezer/spleeter

[2] https://scholar.google.com/citations?view_op=view_citation&h...

[3] https://sigsep.github.io