Structured data ethical implications - Wikimedia-l

11 May 2019


      Dear all,
There have been announcements about the Structured data project on Commons,
that is intended to make it easier to view, search, edit, organize and
re-use the metadata on media.  This is clearly of great value to
researchers and developers in image recognition, who will have a large
repository of tagged image files to train their AI implementations on.
There is however an ethical issue here.  Readers will recall that Google
discovered that its facial regonition software was prone to classifying
African-American faces as "gorilla", because the training dataset had not
contained enough non-white faces -- see for example The Verge
https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-rec...
Is the Foundation confident that the Commons repository is sufficiently
diverse that it can ethically offer it to others as a source of training
data?
Thrapostibongles