A copy of this post can be found at http://ozziesport.com/2010/10/expanded-profile-of-australian-en-wp-users/
My dissertation topic involves doing a demographic and geographic
study of Australian sport fandom online. There are several sites and
social networks where you can get publicly available demographic data to
begin to formulate a picture of the user population, and then segment
that population out by interest in a league, sport and athlete. I’ve
spent a lot of time looking at Twitter, Facebook and LiveJournal.
Recently, partly because of a
trip to the Wikimedia Foundation and discussions with a few people
at UCNISS, my interest in who was
contributing to Australian sport wiki articles on Wikipedia increased.
Finding out who edited Wikipedia articles using publicly available
information is a bit of a challenge. The most reliable information for
who edited comes from IP address information. IP addresses can provide
an idea as to the geographic location of the contributor. It is easy
enough, with the help of a friend, to create a tool that pull the
history of a Wikipedia article, get a list of IP addresses that edited
the article, feed the IP address into another tool that will pull up the
general location of the contributor. (One of my favorite
visualizations of this type of information is WikipediaVision.) The
data isn’t always accurate and if I was looking primarily at New
Zealand, a country without its own dedicated IP address range, this
would be even less reliable. Still, for my purposes, this data works
pretty well.
This data is still pretty limited. There are a lot of articles that
are edited by non-anonymous users. Sometimes, it is possible to get
demographic and geographic information about Wikipedia contributors by
viewing their profile pages. This can just be time consuming to do
manually if an article has a large number of contributors as you need to
view a lot of user pages. It becomes a deterrence for trying to
collect geographic information about article contributors.
I was looking for a more time effective and accurate method of
collecting geographic and demographic information about contributors
that is publicly available on their user pages. The easiest and
quickest way to get this information on a mass scale is to utilize user
box information. Many user boxes, when included on a user page, put the
user into a category. These categories are often then linked through
the Wikipedian
category structure. Beyond that, user boxes involve templates. It
is easy to get a list of articles (user pages) that the template is
included on.
The methodology that I selected from this point is rather
straightforward. It involved:
1. Select a category.
2. Copy and paste the list of articles (user pages) in the selected
category to an Excel spreadsheet. Sort the list alphabetically. Copy
and paste only the user pages to Notepad. Replace * User with blank.
Copy and paste this list back to Excel.
3. Create a filter where the cell contains / . Select those cells.
Copy them to notepad, replace / with [tab] in order to remove user
subpages from the list. Copy this back to Excel. Select only the
column with usernames.
4. Run an advance filter in order to remove all duplicate rows.
5. Copy this list back to the dedicated spreadsheet. Label all those
users with the category from which they were pulled in a unique column.
6. Repeat steps 1 to 5 until all the categories that you want to have
included are included.
7. Merge/Group all the rows by username.
This method may not be the most efficient way of going about doing
this. It can probably be improved by automating some of these steps. In
my case, step 7 was not able to be completed using Excel. I had to
e-mail the file to @woganmay,
who I believe converted the file to a mySQL database, used the group
feature, converted the results back to csv and e-mailed the file to me.
In my case, I did not complete this for every category. Some
categories did not seem worth it time wise as they had too few user
pages to be included. In other cases, the categories were just too big
to do. This included all the members of User de, User en, User es, User
fr, User it, User jp. Only a selected number of categories were
included because of time constraints. Data gathering was focused on
categories that I perceived would have the greatest number of
Australians and other possible contributors to Australian related
articles. When these categories were more exhausted, categories with
between 1,00 and 5,000 articles were selected.
There are all sort of limitations to this data. First, not everyone
includes userboxes on their profile pages. This means that there could
be a lot more Australians on Wikipedia than indicated by userbox
inclusion on a user page. The assumption for the resulting data is that
proportional representation exists for various categories. So while
there are X amount of Christians and Y amount of Atheists, the
assumption that the relationship between X and Y will always be
proportional to the actual population on Wikipedia. Whatever data is
available thus has to be viewed as good enough or supplemented by going
to individual user ages to see if other information is available when a
user appears where no information for someone when running against the
history of the article.
Second, even when they do exist, there are often useful pieces of
information that are missing. For example, in an Australian context,
there is a userbox for Rugby League fans. There is not however a
userbox for Australian rules footy fans. There are also not user boxes
and categories for fans of NRL or AFL teams. (This type of user box and
category exists for National Hockey League teams.)
About halfway through this process, I realized that this data could
be useful for analysis beyond who is editing Wikipedia. At the moment,
I’ve only totaled data I have for Australians. It is pretty fascinating
and would be neat to go further with: How does the proportional size of
the Australian Wikipedian population compare against the actual
population? Does the size of the Australian Atheist versus Christiah
community actively reflect the proportions in Australian society? Or is
the Australian Wikipedian community demographically distinct from the
greater population?
The following tables include the data based on people who were
included in Wikipedians
in Australia and its subcategories and Australian
Wikipedians. A copy of the raw data can be found at October
9 – Wikipedia English Data – Australians.xls. The data is provided
without comment though any attempts at explaining the patterns found
are very much appreciated.
Country |
Count |
Bangladesh |
3 |
Canada |
2 |
Egypt |
2 |
India |
1 |
Indonesia |
2 |
Ireland |
3 |
Jamaica |
2 |
Japan |
5 |
New Zealand |
17 |
Papua New Guinea |
1 |
Republic of Ireland |
5 |
Singapore |
5 |
South Africa |
2 |
South Korea |
1 |
Sri Lanka |
2 |
Tanzania |
2 |
Turkey |
2 |
United States |
16 |
State |
Count |
Australian Capital Territory |
89 |
Canterbury |
1 |
New South Wales |
345 |
Northern Territory |
5 |
Otago |
1 |
Queensland |
208 |
South Australia |
144 |
Southland |
1 |
Tasmania |
54 |
Victoria |
370 |
Wellington |
2 |
Western Australia |
145 |
Degree |
Count |
BA degrees |
21 |
BCom degrees |
2 |
BCS degrees |
3 |
BE degrees |
18 |
BMus degrees |
1 |
BS degrees |
41 |
MS degrees |
5 |
PhD degrees |
18 |
University/Alma Mater |
Count |
Australian National University |
14 |
Avondale College |
1 |
Charles Sturt University |
1 |
Curtin University of Technology |
7 |
Deakin University |
6 |
Flinders University |
7 |
Griffith University |
1 |
James Cook University |
2 |
La Trobe University |
2 |
Macquarie University |
5 |
Massey University |
1 |
Monash University |
19 |
Royal Melbourne Institute of Technology |
10 |
University of Adelaide |
4 |
University of Alberta |
1 |
University of Canberra |
3 |
University of Melbourne |
21 |
University of New England |
4 |
University of New South Wales |
24 |
University of Newcastle |
8 |
University of Sydney |
16 |
University of Tasmania |
3 |
University of Technology, Sydney |
4 |
University of Western Australia |
11 |
University of Wollongong |
4 |
Victorian College of the Arts |
1 |
Student type |
Count |
Business students |
3 |
College students |
26 |
Law students |
9 |
Medical students |
8 |
University students |
59 |
Website |
Count |
Open Directory Project |
1 |
OpenStreetMap |
2 |
Wookieepedia |
1 |
Religion |
Count |
Anglican and Episcopalian |
8 |
Antitheist |
3 |
Atheist |
97 |
Buddhist |
13 |
Catholic |
7 |
Christian |
47 |
Eastern Orthodox |
2 |
Hindu |
1 |
Jewish |
4 |
Lutheran |
1 |
Methodist |
2 |
Muslim |
4 |
Non-denominational Christian |
2 |
Objectivist |
2 |
Pastafarian |
17 |
Presbyterian |
3 |
Protestant |
11 |
Roman Catholic |
10 |
Ethnicity and nationality |
Count |
Argentine |
2 |
Bangladeshi |
2 |
British |
3 |
English |
10 |
Latino/Hispanic |
1 |
Skill |
Count |
Aircraft pilots |
5 |
Artists |
3 |
Engineers |
17 |
Filmmakers |
17 |
Homebrewers |
10 |
Mechanical engineers |
1 |
Professional writers |
1 |
Surfers |
2 |
Profession |
Count |
Accountants |
2 |
Actor |
5 |
Actuaries |
2 |
Aircraft pilots |
5 |
Biologist |
9 |
Broadcasters |
5 |
Chemist |
6 |
Composers |
28 |
Computer scientists |
7 |
Engineers |
17 |
Filmmakers |
17 |
Geoscientists |
2 |
Mechanical engineers |
1 |
Scientists |
7 |
Teacher |
18 |
University teacher |
4 |
Web designers |
2 |
Web developers |
1 |
Interest |
Count |
Chemistry |
27 |
Cooking |
1 |
Physics |
34 |
Strings (physics) |
6 |
Sports |
Count |
Cavers |
2 |
Cross-country runners |
4 |
Dancers |
3 |
Detroit Red Wings fans |
2 |
Equestrians |
2 |
Fencers |
2 |
Geocachers |
8 |
Hikers |
2 |
Hunters |
7 |
Outdoor pursuits |
2 |
Rugby league fans |
50 |
Runners |
2 |
Sailing |
1 |
Scuba divers |
8 |
Snowboarders |
2 |
Swimmers |
16 |
Swing dancers |
1 |
Toronto Maple Leafs fans |
1 |
Ultimate Fighting Championship fans |
2 |
Vancouver Canucks fans |
3 |
WikiProject Tennis members |
4 |
Wikipedia Status |
Count |
Administrator hopefuls |
41 |
Administrators |
45 |
Administrators who will provide copies of deleted articles |
11 |
Bureaucrats |
1 |
Contribute to Wikimedia Commons |
1 |
Create userboxes |
3 |
Opted out of automatic signing |
4 |
Reviewers |
10 |
Rollbackers |
27 |
Service Award Level 01 |
12 |
Service Award Level 02 |
14 |
Service Award Level 03 |
10 |
Service Award Level 04 |
5 |
Service Award Level 05 |
6 |
Service Award Level 06 |
9 |
Service Award Level 07 |
11 |
Service Award Level 08 |
3 |
Service Award Level 09 |
2 |
Wikimedia Commons administrators |
2 |
Philosophy |
Count |
Hindu |
1 |
Humanist |
6 |
Materialist |
9 |
Pastafarian |
16 |
Theist |
9 |
--
twitter: purplepopple
blog: ozziesport.com