This is just to announce that the final draft of my PhD. thesis "Wikipedia: A quantitative analysis" is already finished. Only minor appendixes remain, on general background for some statistical methods that I applied.
It will be (hopefully) approved to be presented in just a few days, though bureacracy will delay the "voce" until middle of March (more or less).
It includes the first quantitative analysis comparing the top 10 language versions of Wikipedia, as of Dec. 2007 (to allow fair comparison of EN with other languages). Among other interesting insights, it presents a complete study of the activity of logged authors, articles and talk pages, evolution in time of distributions of key parameters (diff. authors per article, articles per author, revisions per author/article, etc.).
It also offer a more in-depth study of the inequality of contributions by logged authors, and also for articles. Likewise, it presents a complete survival analysis to examine the average lifetime of Wikipedia contributors, focusing on the transitions first contribution --> joining the core --> core membership --> leaving the core --> abandoning the project.
Finally, we already examine some very basic metrics for quality, analyze the commont quantitative patterns of reputated authors and high quality content and try to infer implications of all these findings for the future sustainability of the Wikipedia work flow model in the following years.
If any of you is interested in having a look at the (still draft) manuscript, I accept on-demand access petitions to the repo :).
I'll wait after the public defense and comments from reviewers to make a public summary of our conclusions.
A reminder that WikiSym 2009 will be in Orlando, Florida, from October
25-27. The deadline for submitting papers, workshops and panel
proposals is March 27; April 24th is the deadline for posters,
demonstrations and WikiFest (practical experience) proposals.
Topics of interest include:
* social software for collaboration and work group processes
* wiki user experiences, usability, and discourse analysis
* reputation systems, quality assurance processes
* scalability---social and technical
* wiki technologies and implementations
* translation and multilingual wiki content
* educational applications
* wiki for non-textual media (images, video, audio)
* content dynamics and wiki evolution
* wiki journalism
* wiki archiving and versioning
* wiki administration: dealing with abuse and resolving conflict
* wiki and the semantic web, knowledge management, tacit knowledge
* wiki for small audiences (departmental and family wikis)
* legal issues (copyright, licensing)
* visualization of wiki structure
* wiki fiction
For more information, see the Call for Papers:
WikiSym is an annual conference devoted to research into all aspects
of wikis, including wiki communities, wiki software and technology,
and using wikis in education and organizations. Research papers about
the Wikimedia projects are welcome! Papers are peer reviewed and
archived in the ACM digital library (see past proceedings:
The conference is colocated with OOPSLA 2009. For more, see:
-- Phoebe Ayers (2009 Wikimedia Liason)
Just forwarding on an announcement from our Head of Community Giving
regarding the public availability of raw data from our online
fundraiser. This may be interesting for researchers to explore
donation patterns to Wikipedia. We'd certainly love to see such
analysis. Please feel free to forward this, and if you want to work on
any specific analysis, please add a note to the bottom of this
- - -
Just announcing the data dumps for our fundraiser data for the 2008
As always, we like to provide our anonymitized data to the community...both
to show our openness and transparency, but also to ask for the
communities help in doing analysis and data crunching to aid our future
The dumps are located here: http://download.wikipedia.org/fundraising/2008/
The main gift data files are 2008_Fundraiser.csv.gz. The fields are
date stamp, USD donation amount, original currency amount, original
currency, country of donation, & payment type
The main data on the site notices and their effectiveness is
The fields here are template (the name of the site notice we used),
tracking source (from where the click came from, campaign (an internal
tag), USD converted amount, original currency, and country of
If any in the community could help analyze the site notice data for
effectiveness, it would be greatly appreciated.
Also, I would like to thank the Wikimedia tech team for helping make
If you have any questions, please send them my way.
Head of Community Giving
Phone: 415.839.6885 x615
“At some future time, I hope to have something witty,
intelligent, or funny in this space.”
Call for Papers
Hawaii International Conference on System Sciences (HICSS)
Research 2.0: Web 2.0 and Virtual Worlds as Research Environments
Please visit the minitrack website for more details:
Part of the Track: Internet and Digital Economy
Paper Submission Deadline:June 15, 2009
The Web 2.0 environment offers many new opportunities for researchers
undertaking both qualitative and quantitative research (e.g.,
increased potential for collecting data from online communities and
social networks around the globe). However, these technologies can
also be used by researchers to enhance the research process (e.g.,
facilitating research collaboration between project team members to
develop tools and technologies to analyze data and write papers).
Understanding the privacy and legal implications in both contexts –
i.e., the implementation of the study, as well as the research process
– is an area that warrants further exploration in a minitrack
This minitrack invites papers on topics including (but not limited to):
Changing landscape for qualitative and quantitative research due to
emergence of Web 2.0 and virtual worlds;
Development of online research communities;
Online collaborative techniques in Web 2.0 environments for advancing
Use of Web 2.0 tools and technologies in data collection and analyses;
Use of Web 2.0 platforms and virtual worlds (e.g., Second Life) such
as avatars, online communities, for conducting qualitative and
Effectiveness of Web 2.0 for increasing participating rates in
research (e.g., questionnaire response rates; online focus groups);
Using user-generated content as a data source in research;
Ethical and legal issues (e.g., privacy; copyright) in conducting
qualitative and quantitative research in virtual environments;
Use of social computing in building research communities;
Role of social computing in the advancement of data collection techniques;
New data collection approaches in Web 2.0 environments.
Lisa M. Given
School of Library and Information Studies
International Institute for Qualitative Methodology (IIQM)
University of Alberta
School of Library and Information Studies
University of Alberta
We have been completly overrun by registrations for the developer meet-up in
Berlin. That’s exhilarating, but forces on me the sad duty to tell you: we are
out of room, we are closing registration early.
So: if you have not yet send a registration mail, you will not be able to attend!
Sorry. We may even have to reject some registrations we have already received.
There’s some good news too, though: anyone interested my join us at the c-base
for the party on saturday March 4., starting 8pm. The developers will be there
and people from the chapter and board meeting will also come. This will be a
good opportunity for getting to know Wikimedians from all over the world.
Forwarding this to wiki-research-l. Dmitry, you may want to subscribe to the
list to see replies <
---------- Forwarded message ----------
From: Dmitry Lizorkin <lizorkin(a)ispras.ru>
Date: Tue, Mar 10, 2009 at 12:21 PM
Subject: [WikiEN-l] community hierararchy of the Wikipedia graph
We recently studied the properties of the English Wikipedia graph and
(1) the graph consists of dense subgraphs (socalled "graph communities")
that are in turn less densely connected to each other;
(2) Wikipedia articles falling into the same community exhibit more
semantic similarity to each other than randomly selected articles.
Encouraged by the above observations, i computed the community hierarchy for
the English Wikipedia:
The hierarchy shows the grouping of similar Wikipedia articles into
communities, based on purely Wikipedia link information, and reflects the
link structure of the Wikipedia graph.
In your opinion, could such data organization be helpful for navigation and
finding related information in Wikipedia?
Your feedback is welcome!
WikiEN-l mailing list
To unsubscribe from this mailing list, visit: