On Tue, Feb 18, 2014 at 9:34 AM, Samuel Klein <meta.sj(a)gmail.com> wrote:
Why do you want categories in the first place? Why
not extract
whatever semantic meaning you need (e.g., about genderbread) by
parsing the sentences in each article?
Because for most people gender is a private matter which never makes it
into their article because being a private matter there are no reliable
sources about it?
Coming from a Western, English-language point of view
it's very easy to
create structures that declare groups of people
such as fa'afafine
incapable
of existing.
... so many assumptions you just made there :-)
Yes, but I happen to know they're all true; because I was speaking of
myself.
Why is this a problem?
The attribute "gender according to DNB" is a) useful historical data,
b) verifiable, and c) easy to add to wikidata. I believe you can have
"DNB-gender" as one of the variations on the global "gender"
attribute. Most articles (unless they are talking about the DNB
specifically) would likely refer to the global attribute. But this
way you can have both datasets globally accessible. Then after the
import is done, people can write bulk data-cleaning scripts to help
humans review those articles where the two differ. And in cases where
there is a years-long edit war about what the global attribute should
be, you can keep track of what the input source-data is from various
sources.
I'm primarily an en.wiki editor and frankly don't care about wikidata,
except as it affects en.wiki.
What I am sure of is that 'gender' on en.wiki defaulting to DNB-gender
unless the individual has spoken about their gender in reliable sources is
inappropriate. Not only does it breach WP:BLP, but by white-washing
minorities it is a travesty of [[Wikipedia:Systemic bias]].
cheers
stuart