Welcome to my world! I am always puzzling about how to better gather stats
from Wikipedia in order to compare to Commons or other language-pedias. The
category system is hopelessly muddled (and sometimes even circular) and
doesn't match up across languages. The Dutch Wikipedia looks down on the
English categorization system and doesn't like over categorization. They
take this so far that they now have thousands of painters in the
non-diffusable category "Dutch painters" and are one of the few
language-pedias without a category for "Dutch Golden Age painters".
That said, you have hit the nail on the head as far as the mission of
Wikidata goes. I have been an enthusiastic contributor there, mostly to the
paintings project called "Sum of all Paintings" (SoaP). Thanks to the work
of lots of GLAM enthusiasts there who work on artists in various collection
databases, slowly artist Wikidata items are being filled with useful data,
such as gender, place and date of birth, field of work, occupation, awards,
degrees, and so on. We have a ways to go, but thanks to the Wikidata gender
game we have lots of gendered data available now. I first started to keep
track of this for artist matches to the RKD database, which includes
gendered data, as a way to see if Wikipedia was at all on target. I assumed
that of the artists in the RKD database Wikidata has the "most famous" and
that of these matches, Wikidata would have a higher percentage of women
than the RKD percentage, because Wikipedians have been working on gendergap
in content for several years now.
It is sort of hard to tell, because Wikidata is still so young, but I have
compiled some information here:
https://www.wikidata.org/wiki/User:Jane023/Gendergap_report
On Sun, Feb 28, 2016 at 3:15 PM, Joe Corneli <holtzermann17(a)gmail.com>
wrote:
On Sun, Feb 28, 2016 at 8:24 AM, Jane Darnell
<jane023(a)gmail.com> wrote:
Oddly, there appears to be no solidarity among
female Wikipedians that
take
this into account, because I assume we have lots
of female academic
Wikipedians who could easily write about other female academics in
academic
articles (or on Wikipedia) if they wanted to and
don't.
I have a very basic question, to do with navigating Wikipedia's
categories. Is there a sensible way to query the category system (or
extracts, e.g. to DBPedia) to produce a side-by-side comparison of how
many pages on♀vs ♂ [might as well add: vs ⚧, i.e. nonbinary] academics
there are in existence on Wikipedia?
I should say that as a user I've often found the category system
confusing, no less in this case.
https://en.wikipedia.org/wiki/Category:Academics -> 36 persons, 14
subcategories
of which one subcategory is:
https://en.wikipedia.org/wiki/Category:Women_academics -> 33 persons,
3 subcategories
of which one subcategory is:
https://en.wikipedia.org/wiki/Category:Women_academics_by_nationality
To take an example: Daniela Müller is on the list of Academics, but
not the list of Women Academics; neither is she listed on these
various subcategory pages:
https://en.wikipedia.org/wiki/Category:Women_historians -> 120 pages,
6 subcategories
https://en.wikipedia.org/wiki/Category:German_women_academics -> 69 pages
https://en.wikipedia.org/wiki/Category:Dutch_women_academics -> 6 pages
Nor, coming at this from another angle, is she listed on:
https://en.wikipedia.org/wiki/Category:Expatriate_academics -> 6
pages, 9 subcategories
... although her bio page says that she is a "German theologian and
church historian, who works in the Netherlands since 2007 and who
holds the chair of Church History/History of Christianity."
narrative: I don't for a moment question that representation is very
unequal (and we could re-do this exercise along other dimensions as
you suggest Jane -- as evidenced by the German women vs Dutch women
comparison, combining dimensions produces revealing results)... but I
wish I knew just HOW unequal things are. At the moment it seems very
difficult to know the answer to that question -- but, again, this may
be because I'm naive about the art of wiki querying.
I know that some researchers have managed to get good data out about
this sort of thing, e.g.
«More information on Wikipedia deals with Europe than all of the
locations outside of Europe.»
GRAHAM , M., HOGAN , B., STRAUMANN , R. K., AND MEDHAT , A. 2014.
Uneven geographies of user-generated information: patterns of
increasing informational poverty. Annals of the Association of
American Geographers 104, 4, 746–764.
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l