Can anyone provide a simple explanation or example that illustrates why Yann was wrong? I’m highly predisposed to take his side in this, but I wonder if I could be missing something.

TLDR:

LeCun - ML is biased when datasets are biased. But unlike deploying to real world problems, I don't see any ethical obligation to use "unbiased" datasets for pure research or tinkering with models.

Gebru - This is wrong, this is hurtful to marginalized people and you need to listen to them. Watch my tutorial for an explanation.

Headlines from tutorial (that Gebru didn't even link herself): The CV community is largely homogenous and has very few black people. Here's a bunch of startups that purport to use CV to predict IQ, hiring, etc. Marginalized people don't work on these platforms and there's no legal vetting for fairness before these platforms are deployed. Facial analysis has the highest rate of inaccuracy (gender classification) on fair-skinned men (?) and dark-skinned women. Datasets are usually white/male. Most object detection models are biased towards Western concepts (e.g. marriage). Crash test dummies are representative of males, so women and children are overrepresented in car crash injuries. Nearest neighbor image search is a unfair because of automation bias and surveillance bias. China is using face detection for surveilling ethnic minorities. Amazon's face recognition sold to police had the same biases (greater difficulty distinguishing between black women).

Now, I largely agree with what Gebru said in the tutorial. So does LeCun, who explicitly agreed a number of times that biased datasets/models should never be used for deployed solutions.

But it's a huge leap in logic to then demand that every research dataset be "unbiased". It's like criticizing someone for using exclusively male Lego figures to storyboard a movie shoot, or if I attacked a Chinese researcher because they only used Chinese faces to train a generative model, and none the outputs looked anything like me.

That being said, I'm open to being convinced if she had made any effort to show/prove that "use of biased datasets in research" is correlated with "biased outcomes in real world production deployments". But she didn't, which is why her criticism of LeCun smacks of cheap point-scoring rather than genuine debate (a criticism I made of Twitter generally the last time this topic came up).

Do you believe that industry uses pre-trained model that researchers release?

Do you believe that industry uses pre-made datasets that researchers promote in their work?

Would yes to the above two question be sufficient to show "use of biased datasets in research" is correlated with "biased outcomes in real world production deployments"?

Regardless of the answers, wouldn't the problem be placed on the feet of "industry", and not the researchers?

Should cryptography researchers backdoor their own papers, because terrorists or pedophiles might use it?

A major aspect of crypto research today is crypto UX and making crypto systems that are difficult to misuse. There are academics who actively work on these issues. They aren't the only academics obviously, but they exist.

Building ML systems that are difficult to misuse is underexplored, and Timnit is one of the relatively few researchers actively doing work in this area.

>A major aspect of crypto research today is crypto UX and making crypto systems that are difficult to misuse.

I'm intrigued by this. Any names (projects/people/protocols) come to mind?

Tink (https://github.com/google/tink) and Age (https://github.com/FiloSottile/age) are the obvious examples, although I think to some extent even things like the Signal Protocol apply.

I'd call them both examples of applied cryptography research. I think these projects compare very, very closely to applied ML research:

They come out of industry research labs, are worked on by respected experts, usually involving some academics, ultimately you end up with an artifact beyond just a paper that is useful for something and improves upon the status quo.

I'm admittedly not a total expert, so I don't know how far down to the level of crypto "primitives" this kind of work goes, but I believe there is some effort to pick primitives that are difficult to "mess up" (think "bad primes") and I know tink actively prevents you from making bad choices in the cases where you are forced to make a choice.

Even more broadly, just consider any tptacek (who I should clarify is *not a researcher, lest he correct me) post on pgp/gpg email, or people like Matt Green (http://mattsmith.de/pdfs/DevelopersAreNotTheEnemy.pdf).

Edit: Some poking around also brought up this person: https://yaseminacar.de/, who has some interesting papers on similar subjects.