This seems like a positive step in the right direction, YouTube comment spam was getting out of control.

Nothing but respect for the engineers at Youtube working on fighting back against it. Seems like a very difficult classification problem.

Is it very difficult? So many of these spam accounts follow the exact same format, with names following some variation of "Free stuff Telegram: +1234…" or "s3xY p1cs c0mE 2 mY chAnNeL", etc. Tons of them post these obviously made-up threads about how "Dr John Smith" has helped them triple their crypto investments.

There's probably a dozen or so templates that many of these spammers and scammers use, and if you can spot them in a second, it's not hard to train a model to recognize them also.

It is out of control because YouTube does not care to fix it, not because it's some insurmountably hard problem that one of the top AI companies in the world just can't figure out.

I can totally see that if you assume it’s simple and easy to fix that it must just be another case of people not caring. Yet here we are replying to a post about the not-ideal steps the company is taking. And further this is a problem for multiple large companies with user commenting and chat, from Google through Roblox, Twitch, Twitter and beyond. There are a few things that make naive solutions impractical:

* Cost to build them.

* Cost to keep them up to date in a highly adversarial environment. Note this might mean scraping the solution and starting again.

* Cost of running them at huge scale.

Particularly important is to understand the attack surface is so high and cost to the opposition is so low that people will defeat your approach recreationally just to spam obscenities.

Guy makes concrete suggestions, you respond with abstract rebuttals that add up to "it's too hard". All he's saying is low-hanging fruit is pluckable; pluck it.

Guy doesn’t understand why perceived low hanging fruit hasn’t been plucked and I explained why it’s not that simple or low hanging. That’s fine, naive filtering (now with added ML!) is the first idea everyone comes up with. Then they learn about the Scunthorpe problem and beyond.

Sure, it's hard to get around sophisticated operator., but most of the spam on YT that I see is not sophisticated.

You wouldn't even need ML, for the majority of them.

There is actually OS tool[1] that can do this for individual creators. That tool, finds a lot of spam, that YT itself does not, and it's developed by a single guy.

So there definitively is plenty of opportunity to pluck the low hanging fruit.

But Goolge has probably 100 of PHD's developing some kind of ML, that ends up performing worse than some simple regexes, because you don't get promoted for simple solutions.

[1]: https://github.com/ThioJoe/YT-Spammer-Purge