On Tue, Feb 18, 2014 at 9:34 AM, Samuel Klein meta.sj@gmail.com wrote:
Why do you want categories in the first place? Why not extract whatever semantic meaning you need (e.g., about genderbread) by parsing the sentences in each article?
Because for most people gender is a private matter which never makes it into their article because being a private matter there are no reliable sources about it?
Coming from a Western, English-language point of view it's very easy to
create structures that declare groups of people such as fa'afafine
incapable
of existing.
... so many assumptions you just made there :-)
Yes, but I happen to know they're all true; because I was speaking of myself.
Why is this a problem? The attribute "gender according to DNB" is a) useful historical data, b) verifiable, and c) easy to add to wikidata. I believe you can have "DNB-gender" as one of the variations on the global "gender" attribute. Most articles (unless they are talking about the DNB specifically) would likely refer to the global attribute. But this way you can have both datasets globally accessible. Then after the import is done, people can write bulk data-cleaning scripts to help humans review those articles where the two differ. And in cases where there is a years-long edit war about what the global attribute should be, you can keep track of what the input source-data is from various sources.
I'm primarily an en.wiki editor and frankly don't care about wikidata, except as it affects en.wiki.
What I am sure of is that 'gender' on en.wiki defaulting to DNB-gender unless the individual has spoken about their gender in reliable sources is inappropriate. Not only does it breach WP:BLP, but by white-washing minorities it is a travesty of [[Wikipedia:Systemic bias]].
cheers stuart