On Tue, Feb 18, 2014 at 9:34 AM, Samuel Klein <meta.sj@gmail.com> wrote:
Why do you want categories in the first place?  Why not extract
whatever semantic meaning you need (e.g., about genderbread) by
parsing the sentences in each article?

Because for most people gender is a private matter which never makes it into their article because being a private matter there are no reliable sources about it?

> Coming from a Western, English-language point of view it's very easy to
> create structures that declare groups of people such as fa'afafine incapable
> of existing.

... so many assumptions you just made there :-)

Yes, but I happen to know they're all true; because I was speaking of myself.
Why is this a problem?
The attribute "gender according to DNB" is a) useful historical data,
b) verifiable, and c) easy to add to wikidata. I believe you can have
"DNB-gender" as one of the variations on the global "gender"
attribute.  Most articles (unless they are talking about the DNB
specifically) would likely refer to the global attribute.  But this
way you can have both datasets globally accessible.  Then after the
import is done, people can write bulk data-cleaning scripts to help
humans review those articles where the two differ.  And in cases where
there is a years-long edit war about what the global attribute should
be, you can keep track of what the input source-data is from various
I'm primarily an en.wiki editor and frankly don't care about wikidata, except as it affects en.wiki.

What I am sure of is that 'gender' on en.wiki defaulting to DNB-gender unless the individual has spoken about their gender in reliable sources is inappropriate. Not only does it breach WP:BLP, but by white-washing minorities it is a travesty of [[Wikipedia:Systemic bias]].