Hello Piotr and Gerard, 

    I think a competing hypothesis would be "male gaze". That is to say, the more female representation is not about a culture (defined as national, ethnic, linguistic or regional, not macho/feminine), but rather a gender-interest bias. Thus the more female representation could mean more male dominant culture, which is against the theoretical assumption of  Piotr's research. 

    Note that East Asian Wikipedians that I know, especially those who edit Chinese Wikipedia, are predominantly very young. Some of them can be highly interested in opposite sex.

    Check the following category pages as examples:
(1a) Female actresses of every countries in the world
http://zh.wikipedia.org/wiki/Category:%E5%90%84%E5%9C%8B%E5%A5%B3%E6%BC%94%E5%93%A1
(1b) Male actresses of every countries in the world
http://zh.wikipedia.org/wiki/Category:%E5%90%84%E5%9B%BD%E7%94%B7%E6%BC%94%E5%91%98

(2a) Female Japanese AV (i.e. porn) actresses
http://zh.wikipedia.org/w/index.php?title=Category:%E6%97%A5%E6%9C%ACAV%E5%A5%B3%E5%84%AA
(2b) Male Japanese AV (i.e. porn) actresses
http://zh.wikipedia.org/w/index.php?title=Category:%E6%97%A5%E6%9C%ACAV%E7%94%B7%E5%84%AA

    It is quiet clear that the male gaze hypothesis seems to apply here. More female presentation simply because they are there to be consumed by men or boys.

    So one of my suggestions for research is to select a few professional categories that are of interest (say, politicians, poets, entertainers, etc.) to do some cross-tab analysis. 

    Thus, I will be extremely cautious against using the current metrics/methods as viable "gender inequality index". 

    As a proponent of "data normalization" and "geographic normalization" method myself, I would distinguish two sets of comparisons: one is cross-country or cross-language version absolute value comparison, another is cross-country or cross-language version "normalized" value comparison. By geographic normalization, I mean that researchers must gather another set of cross-country or cross-language datasets that captures some aspects of realities "external" to Wikipedia. In this case, I would say the Wikipedia represented politicians' gender ratio against the offline gender ratio of politicians. In other words, "data normalization" allows researchers to compare which language version are more or less (and how much) equal than the corresponding offline societies.

    BTW, the methods you develop to extract gender from biography articles for large-scale analysis may also be re-purpose to study other dimensions. One dimension that will interest me would be nationality. It will be interesting to see the coverage, focus or bias of a language version on people based on nationalities. Age might be another one.

Best,
han-teng liao



2015-01-11 19:01 GMT+02:00 Gerard Meijssen <gerard.meijssen@gmail.com>:
Hoi,
Having read it, I find it is still very much a Wikipedia oriented.It makes use of the toolset by Markus. That is fine. the notion of diversity and notability is also very much culturally defined. It would be nice to know how the different wikipedias accept notability of people from other cultures and if it impacts the diversity of their own articles.

I have found that many people do not have an article in the languages of their own cultures. Often it has to do with an interest in a domain that is more of relevance to the other culture.

Diversity is very much part of a domain; in Roman Catholicism male dominance is obvious. I am curious if diversity in gender is affected by such considerations and if items with a single article are more in line with what is the norm for a culture, a domain.
Thanks,
     GerardM

On 10 January 2015 at 11:51, Piotr Konieczny <piokon@post.pl> wrote:
Here (http://notconfusing.com/preliminary-results-from-wigi-the-wikipedia-gender-inequality-index/) are some early findings from a research project I am involved in (together with Maximilian Klein). (To find out more about the project, see https://meta.wikimedia.org/wiki/Research:Wikipedia_Gender_Inequality_Index and it's talk page). We are very curious what you think (don't hesitate to be critical). What we would really appreciate would be any alternative hypotheses (to the one presented) that could try to explain why post-1950s Confucian and South Asian clusters seem so much more inclusive of female biographies than others (including the "Western" clusters). Are we seeing a data error, or something else - and if so, what?

--
Piotr Konieczny, PhD
http://hanyang.academia.edu/PiotrKonieczny
http://scholar.google.com/citations?user=gdV8_AEAAAAJ
http://en.wikipedia.org/wiki/User:Piotrus


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l