Dear all, There have been announcements about the Structured data project on Commons, that is intended to make it easier to view, search, edit, organize and re-use the metadata on media. This is clearly of great value to researchers and developers in image recognition, who will have a large repository of tagged image files to train their AI implementations on.
There is however an ethical issue here. Readers will recall that Google discovered that its facial regonition software was prone to classifying African-American faces as "gorilla", because the training dataset had not contained enough non-white faces -- see for example The Verge https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-rec...
Is the Foundation confident that the Commons repository is sufficiently diverse that it can ethically offer it to others as a source of training data?
Thrapostibongles
Hi Mister Thrapostibongles,
This is a good point and a valid consideration. WMF is starting to think about issues like this, and what tools we have available to mitigate unintended consequences of AI tech (even in cases where we're not building the AI tech itself, but rather providing training data). I wrote up a white paper https://meta.wikimedia.org/wiki/File:Ethical_and_human-centererd_AI_-_Wikimedia_Research_2030.pdf on this topic recently, in consultation with some other folks in research, product, and legal. This isn't a policy (yet), just a proposal and a conversation starter. Feedback and discussion welcome!
Best, Jonathan
On Sun, May 12, 2019 at 1:50 AM Mister Thrapostibongles < thrapostibongles@gmail.com> wrote:
Dear all, There have been announcements about the Structured data project on Commons, that is intended to make it easier to view, search, edit, organize and re-use the metadata on media. This is clearly of great value to researchers and developers in image recognition, who will have a large repository of tagged image files to train their AI implementations on.
There is however an ethical issue here. Readers will recall that Google discovered that its facial regonition software was prone to classifying African-American faces as "gorilla", because the training dataset had not contained enough non-white faces -- see for example The Verge
https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-rec...
Is the Foundation confident that the Commons repository is sufficiently diverse that it can ethically offer it to others as a source of training data?
Thrapostibongles _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
wikimedia-l@lists.wikimedia.org