Welcome to my world! I am always puzzling about how to better gather stats from Wikipedia in order to compare to Commons or other language-pedias. The category system is hopelessly muddled (and sometimes even circular) and doesn't match up across languages. The Dutch Wikipedia looks down on the English categorization system and doesn't like over categorization. They take this so far that they now have thousands of painters in the non-diffusable category "Dutch painters" and are one of the few language-pedias without a category for "Dutch Golden Age painters".
That said, you have hit the nail on the head as far as the mission of Wikidata goes. I have been an enthusiastic contributor there, mostly to the paintings project called "Sum of all Paintings" (SoaP). Thanks to the work of lots of GLAM enthusiasts there who work on artists in various collection databases, slowly artist Wikidata items are being filled with useful data, such as gender, place and date of birth, field of work, occupation, awards, degrees, and so on. We have a ways to go, but thanks to the Wikidata gender game we have lots of gendered data available now. I first started to keep track of this for artist matches to the RKD database, which includes gendered data, as a way to see if Wikipedia was at all on target. I assumed that of the artists in the RKD database Wikidata has the "most famous" and that of these matches, Wikidata would have a higher percentage of women than the RKD percentage, because Wikipedians have been working on gendergap in content for several years now.
It is sort of hard to tell, because Wikidata is still so young, but I have compiled some information here: https://www.wikidata.org/wiki/User:Jane023/Gendergap_report
On Sun, Feb 28, 2016 at 3:15 PM, Joe Corneli holtzermann17@gmail.com wrote:
On Sun, Feb 28, 2016 at 8:24 AM, Jane Darnell jane023@gmail.com wrote:
Oddly, there appears to be no solidarity among female Wikipedians that
take
this into account, because I assume we have lots of female academic Wikipedians who could easily write about other female academics in
academic
articles (or on Wikipedia) if they wanted to and don't.
I have a very basic question, to do with navigating Wikipedia's categories. Is there a sensible way to query the category system (or extracts, e.g. to DBPedia) to produce a side-by-side comparison of how many pages on♀vs ♂ [might as well add: vs ⚧, i.e. nonbinary] academics there are in existence on Wikipedia?
I should say that as a user I've often found the category system confusing, no less in this case.
https://en.wikipedia.org/wiki/Category:Academics -> 36 persons, 14 subcategories
of which one subcategory is:
https://en.wikipedia.org/wiki/Category:Women_academics -> 33 persons, 3 subcategories
of which one subcategory is:
https://en.wikipedia.org/wiki/Category:Women_academics_by_nationality
To take an example: Daniela Müller is on the list of Academics, but not the list of Women Academics; neither is she listed on these various subcategory pages:
https://en.wikipedia.org/wiki/Category:Women_historians -> 120 pages, 6 subcategories https://en.wikipedia.org/wiki/Category:German_women_academics -> 69 pages https://en.wikipedia.org/wiki/Category:Dutch_women_academics -> 6 pages
Nor, coming at this from another angle, is she listed on:
https://en.wikipedia.org/wiki/Category:Expatriate_academics -> 6 pages, 9 subcategories
... although her bio page says that she is a "German theologian and church historian, who works in the Netherlands since 2007 and who holds the chair of Church History/History of Christianity."
narrative: I don't for a moment question that representation is very unequal (and we could re-do this exercise along other dimensions as you suggest Jane -- as evidenced by the German women vs Dutch women comparison, combining dimensions produces revealing results)... but I wish I knew just HOW unequal things are. At the moment it seems very difficult to know the answer to that question -- but, again, this may be because I'm naive about the art of wiki querying.
I know that some researchers have managed to get good data out about this sort of thing, e.g.
«More information on Wikipedia deals with Europe than all of the locations outside of Europe.»
GRAHAM , M., HOGAN , B., STRAUMANN , R. K., AND MEDHAT , A. 2014. Uneven geographies of user-generated information: patterns of increasing informational poverty. Annals of the Association of American Geographers 104, 4, 746–764.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l