Welcome to my world! I am always puzzling about how to better gather stats from Wikipedia in order to compare to Commons or other language-pedias. The category system is hopelessly muddled (and sometimes even circular) and doesn't match up across languages. The Dutch Wikipedia looks down on the English categorization system and doesn't like over categorization. They take this so far that they now have thousands of painters in the non-diffusable category "Dutch painters" and are one of the few language-pedias without a category for "Dutch Golden Age painters".

That said, you have hit the nail on the head as far as the mission of Wikidata goes. I have been an enthusiastic contributor there, mostly to the paintings project called "Sum of all Paintings" (SoaP). Thanks to the work of lots of GLAM enthusiasts there who work on artists in various collection databases, slowly artist Wikidata items are being filled with useful data, such as gender, place and date of birth, field of work, occupation, awards, degrees, and so on. We have a ways to go, but thanks to the Wikidata gender game we have lots of gendered data available now. I first started to keep track of this for artist matches to the RKD database, which includes gendered data, as a way to see if Wikipedia was at all on target. I assumed that of the artists in the RKD database Wikidata has the "most famous" and that of these matches, Wikidata would have a higher percentage of women than the RKD percentage, because Wikipedians have been working on gendergap in content for several years now. 

It is sort of hard to tell, because Wikidata is still so young, but I have compiled some information here: 
https://www.wikidata.org/wiki/User:Jane023/Gendergap_report

On Sun, Feb 28, 2016 at 3:15 PM, Joe Corneli <holtzermann17@gmail.com> wrote:
On Sun, Feb 28, 2016 at 8:24 AM, Jane Darnell <jane023@gmail.com> wrote:

> Oddly, there appears to be no solidarity among female Wikipedians that take
> this into account, because I assume we have lots of female academic
> Wikipedians who could easily write about other female academics in academic
> articles (or on Wikipedia) if they wanted to and don't.

I have a very basic question, to do with navigating Wikipedia's
categories.  Is there a sensible way to query the category system (or
extracts, e.g. to DBPedia) to produce a side-by-side comparison of how
many pages on♀vs ♂ [might as well add: vs ⚧, i.e. nonbinary] academics
there are in existence on Wikipedia?

I should say that as a user I've often found the category system
confusing, no less in this case.

https://en.wikipedia.org/wiki/Category:Academics -> 36 persons, 14 subcategories

of which one subcategory is:

https://en.wikipedia.org/wiki/Category:Women_academics -> 33 persons,
3 subcategories

of which one subcategory is:

https://en.wikipedia.org/wiki/Category:Women_academics_by_nationality

To take an example: Daniela Müller is on the list of Academics, but
not the list of Women Academics; neither is she listed on these
various subcategory pages:

https://en.wikipedia.org/wiki/Category:Women_historians -> 120 pages,
6 subcategories
https://en.wikipedia.org/wiki/Category:German_women_academics -> 69 pages
https://en.wikipedia.org/wiki/Category:Dutch_women_academics -> 6 pages

Nor, coming at this from another angle, is she listed on:

https://en.wikipedia.org/wiki/Category:Expatriate_academics -> 6
pages, 9 subcategories

... although her bio page says that she is a "German theologian and
church historian, who works in the Netherlands since 2007 and who
holds the chair of Church History/History of Christianity."

narrative: I don't for a moment question that representation is very
unequal (and we could re-do this exercise along other dimensions as
you suggest Jane -- as evidenced by the German women vs Dutch women
comparison, combining dimensions produces revealing results)... but I
wish I knew just HOW unequal things are.  At the moment it seems very
difficult to know the answer to that question -- but, again, this may
be because I'm naive about the art of wiki querying.

I know that some researchers have managed to get good data out about
this sort of thing, e.g.

  «More information on Wikipedia deals with Europe than all of the
locations outside of Europe.»

GRAHAM , M., HOGAN , B., STRAUMANN , R. K., AND MEDHAT , A. 2014.
Uneven geographies of user-generated information: patterns of
increasing informational poverty. Annals of the Association of
American Geographers 104, 4, 746–764.

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l