I’m suddenly interested in making a neural network that can mute a known reference audio from a stream while leaving the surrounding audio in tact. Doesn’t something like this already exist? It seems like Instagram and YouTube could pretty easily leave audio-edited videos online while addressing the copyright violation, without resorting to takedowns or account bans.

No the exact thing you're after, but you may be interested in Spleeter[0] which can pick out individual instruments from a given track with surprising accuracy. Deezer uses it to aid in song recognition. In this case, you could theoretically pick out the whole track and leave everything else or something.

[0] https://github.com/deezer/spleeter