[Foundation-l] statistics
Robert Scott Horning
robert_horning at netzero.net
Wed Apr 5 14:55:46 UTC 2006
kent emerson wrote:
>I am preparing a Masters thesis on wikipedia and am
>questioning whether
>
>
>>wikipedia users are representative of the regular
>>
>>
>population. I would
>
>
>>like to know what demographic uses wikipedia? Have
>>
>>
>you collected any
>
>
>>statistics on the representative makeup of those who
>>
>>
>work on your site
>
>
>>including gender, age, geographic location, income,
>>
>>
>education, ownership
>
>
>>of home, marital status? If so, would you feel
>>
>>
>comfortable sharing it
>
>
>>with me for the purposes of my research.
>>
>>I would appreciate if you can help.
>>
>>Kent Emerson.
>>
>>
>
>
>
There is a huge amount of raw information that can be gleaned from the
user pages to help prepare such a demographic cross-section, but it is
not organized into neat tables and is only raw data. It is also
self-proclaimed, so it is somewhat suspect as well. Some of the users
put up special "templates" (especially on Wikipedia) that proclaim
different skills, with the most typical announcement being the
proclaimation of what language skills you have (the Babel templates).
This has since been expanded to computer programming languages,
political leanings including internal politics on Wikimedia projects,
schools of thought, marital status, hobbies, geographic origin and other
different interests. Some of the information can also simply be found
right on the user page in raw text.
Doing statistical analysis on this very rich set of information on user
pages might be a very interesting study, but it is going to take quite a
bit of work to pull all of the information together and will be a very
tedious process. Rather than waiting for people to e-mail you with
responses, this would give you a much larger set of data to work with,
and can be cross referenced to articles and activity levels to give
extra dimensions of research variables to look at. If you are really
interested in doing something like this (and worthy of a master's
degree), I would strongly recommend that you obtain a full dump of one
of the Wikipedia databases and get a skilled database guru to help you
out in terms of allowing you to "mark up" various users according to
criteria that you want to use in the study, and compare that to other
factors including their status as administrators, articles they have
edited, and activity level. This isn't going to be handed to you on a
silver platter, but there is data available if you are willing to do the
work of organizing it.
One other place to look in terms of seeing what other statistical data
has been developed for Wikimedia projects is to see the collection of
statistical analysis pages that were developed by Eric Zachte and can be
found here: http://stats.wikimedia.org/
These tables are more oriented toward measuring the growth of Wikimedia
projects instead of demographic comparisons, but there is multi-lingual
data available here as well that might be useful for you to review as
well. Chronological information is also available in the raw database
dump, and another factor to consider with this sort of study.
--
Robert Scott Horning
More information about the foundation-l
mailing list