I have processed most of the RfA data of the german wikipedia with a
imperfect workflow of
1. a perl script that fetches the data (http://en.wikipedia.org/wiki/Perl
2. a manual correction of exceptions and
3. a validation and import by an GNU R script. (www.r-project.org/)
Because of exceptions the process can as far as I see not be fully
automated. The perl script is just a quick hack and I had to change it
during the process. So it is still a lot of manual work as long as
there is no better implementation of votes in the wiki engine.
Contact if you want me to send it to you.
(Besides many questions that I wanted to answer but wasn't able to
answer (-; ) I am interested in the simple question: What dertermines
vote decisions of individuals in WP?
I expect "pro"-votes to be
1. a mixture of expression of TRUST in (a) the candidate, (b) the
nominating user or (c) users that have voted before.
2. a bounded rational (especially affective distorted) decision about
the expected benefit of a candidate
(a) according to OWN political and organisation-political INTERESTS and
(b) according to the perceived benefit to ORGANISATIONAL INTERESTS.
I am about to test the effect of different user variables (aggregated
of logged user actions like revisions and user relations that can be
accessed from within the https://wiki.toolserver.org/view/Main_Page)
on vote decisions.
As far as I see that is complementary to surveying the voters what Ben
is about to do. From a sociological point of view what participants
think and say musn't have to be exactly what they are doing.
Best wishes from Germany
So far, yes, you're right unless for DE. We have included
Which is precisely a processed dump of the logging.xml dump, with all info about logged events you're mentioning.
In the following days (as soon as I'm able to find some free slots) we will add support for all remaining languages as for the logged info.
--- El lun, 29/6/09, Marc Schwenzer <hkiws(a)gmx.de> escribió:
> De: Marc Schwenzer <hkiws(a)gmx.de>
> Asunto: Re: [Wiki-research-l] Public repositories for research dumps
> Para: "Research into Wikimedia content and communities" <wiki-research-l(a)lists.wikimedia.org>
> Fecha: lunes, 29 junio, 2009 11:57
> Hi Felipe,
> Thank you for your work.
> As far as I understood from the project page there are only
> revisions and not other logged actions like protects, user
> etc. in the WikiXRay-Dump. Am I right about that?
> Best greetings
> Wiki-research-l mailing list
I'm part of a team at Carnegie Mellon University that is studying the
Request for Adminship process and would like to run a survey on RfA voters.
Does anyone have any suggestions on how to successfully solicit for users to
take a survey without being intrusive? Any suggestions at all on conducting
the survey within Wikipedia would be helpful.
Carnegie Mellon University
Since just a few hours ago, a new public repository has been created to host WikiXRay database dumps, containing info extracted from public Wikipedia dbdumps. The image is hosted by RedIRIS (in short, the Spanish equivalent of Kennisnet in Netherlands).
These new dumps are aimed to save time and effort to other researchers, since they won't need to parse the complete XML dumps to extract all relevant activity metadata. We used mysqldump to create the dumps from our databases..
As of today, only some of the biggest Wikipedias are available. However, in the following days the full set of available languages will be ready for downloading. The files will be updated regularly.
The procedure is as follows:
1. Find the research dump of your interest. Download and decompress it in your local system.
2. Create a local DB to import the information.
3. Load the dump file, using a MySQL user with insert privileges:
$> mysql -u user -p passw myDB < dumpfile.sql
And you're done.
Final warning. 3 fields in the revision table are not reliable yet:
All remaining fields/values are trustable (in particular rev_len, rev_num_words, and so forth).
I am a researcher in the GroupLens lab (http://grouplens.org) at the
University of Minnesota. You might recognize our previous work in
Wikipedia like "Creating, Destroying, and Restoring Value in Wikipedia"
As part of our continuing work within Wikipedia, my colleagues and I are
conducting an academic (non-commercial) user study where we have
developed an interface modification that is designed to help users work
together more effectively by changing the interface for reverting other
editors. You can help us complete this study by installing and using
If you choose to participate in the study, you will be automatically
assigned a Wikipedia gadget that will consist of a subset of the
modifications we have developed. As part of the study, we will be
logging your usage of the tool (ie. when you are reverting other
Throughout the study, we will be available for tech support and bug
fixes. There will be a survey at the completion and the entire tool
will be made publicly available.
Consent form/installer: http://wikipedia.grouplens.org/NICE/consent/
Wikipedia user script page: http://en.wikipedia.org/wiki/User:EpochFail/NICE
University of Minnesota
This is just to announce that the final draft of my PhD. thesis "Wikipedia: A quantitative analysis" is already finished. Only minor appendixes remain, on general background for some statistical methods that I applied.
It will be (hopefully) approved to be presented in just a few days, though bureacracy will delay the "voce" until middle of March (more or less).
It includes the first quantitative analysis comparing the top 10 language versions of Wikipedia, as of Dec. 2007 (to allow fair comparison of EN with other languages). Among other interesting insights, it presents a complete study of the activity of logged authors, articles and talk pages, evolution in time of distributions of key parameters (diff. authors per article, articles per author, revisions per author/article, etc.).
It also offer a more in-depth study of the inequality of contributions by logged authors, and also for articles. Likewise, it presents a complete survival analysis to examine the average lifetime of Wikipedia contributors, focusing on the transitions first contribution --> joining the core --> core membership --> leaving the core --> abandoning the project.
Finally, we already examine some very basic metrics for quality, analyze the commont quantitative patterns of reputated authors and high quality content and try to infer implications of all these findings for the future sustainability of the Wikipedia work flow model in the following years.
If any of you is interested in having a look at the (still draft) manuscript, I accept on-demand access petitions to the repo :).
I'll wait after the public defense and comments from reviewers to make a public summary of our conclusions.
Yes, indeed this is a quite interesting topic due to the methodological challenge (how to get a representative sample for your study) and the interference with the community.
I subscribe the proposal made by Alexander and Reid, and the rest of previous comments. Some additional context from our long past experience in this kind of initiatives with communities driven by volunteers.
The main problem for researchers in this area is that it is:
1) Difficult to find community members willing to participate.
2) Problematic to find them without interfering the normal cruise of life in the community (like "spamming" talk pages, no matter how good your initial intention was in doing it).
3) Even more important, avoiding to "bother" them more than a reasonable number of times. The first study with Apache community went on relatively well (25% or so of answers). The second attempt failed miserably. Volunteers want to spend their time maintaining Debian packages, dealing with Apache issues, providing new features to Gnome desktop apps, improving Wikipedia articles, etc. They don't want to spend too much time answering "yet another survey". And volunteer communities get burnt quite quickly in this sense, trust me.
An example of a rather participative survey has been the Wikipedia Global Survey (as far as it seems), but of course it took major support from WM Management Board itself. This could not be feasible for all the cases.
Honestly, despite our numerous previous experiences, I haven't got a perfect answer for this issues, but in the meantime, the opt-in experiments flag seems to be a good approach to try for.
--- El jue, 4/6/09, Reid Priedhorsky <reid(a)umn.edu> escribió:
> De: Reid Priedhorsky <reid(a)umn.edu>
> Asunto: Re: [Wiki-research-l] WIkipedia proposal: internal IRB for research
> Para: "Research into Wikimedia content and communities" <wiki-research-l(a)lists.wikimedia.org>
> Fecha: jueves, 4 junio, 2009 1:01
> On 06/03/2009 05:29 PM, Alexander
> Foley wrote:
> > I think a better course of action might be to
> establish a Wikipedia
> > Experiments subgroup of users who opt-in to
> participate in
> > experiments, much like what Google does with its
> > features. You're limiting the sample quite a
> bit, and quite possibly
> > only getting involved or heavily involved Wikipedia
> users, but if your
> > core survey group is editors it would likely be
> I agree. I'd extend this notion a bit: my impression is
> that most people
> are more than happy to be solicited for studies (provided
> it doesn't
> happen too often). So I'd suggest two components:
> 1. The opt-in defines specifically how frequently one can
> be solicited
> (e.g. N times per year).
> 2. The opt-in is widely pushed: highly visible on account
> creation, and
> all existing users get one (1) invitation to opt in.
> I think #2 is important because a de facto policy that only
> involved Wikipedians participate in research would be
> severely limiting
> to the work we do.
> Wiki-research-l mailing list
FGWM 2009 - Deadline Extension (new date: June 15; notification July 13)
Workshop "Knowledge and Experience Management" of the knowledge
management working group
Workshop "Wissens- und Erfahrungsmanagement" der Fachgruppe
This workshop takes place in the context of the meeting of the
knowledge management working group (http://www.fgwm.de) of the German
society of computer science (GI) and is to make an interdisciplinary
forum available both for scientists and practitioners. The exchange of
innovative ideas and practical applications in the field of knowledge
and experience management is the desired goal of this workshop.
Therefore submissions from the current research out of these and
adjacent areas are welcome. Moreover, contributions that describe work
in progress or approaches that have not yet been investigated
comprehensively, thus having a provisional character, are of
interest. The latter should, however, be described sufficiently clearly
and in a structured way, in order to serve as a basis for interesting
discussions among the participants.
*Topics of Interest*:
* Experience/knowledge search and knowledge integration approaches
(case-based reasoning, logic-based approaches, text-based
approaches, semantic portals/wikis/blogs, Web 2.0, etc.)
* Applications of knowledge and experience management
(corporate memories, e-commerce, design, tutoring/e-learning,
e-government, software engineering, robotics, medicine, etc.)
* (Semantic) Web Services for knowledge manangement
* Agile approaches within the knowledge management domain
* Agent-based & Peer-to-Peer knowledge management
* Just-in-time retrieval and just-in-time knowledge capturing
* Ways of knowledge representation (ontologies, similarity, retrieval,
adaptive knowledge, etc.)
* Support of authoring and maintenance processes
(change management, requirements tracing, (distributed) version
* Evaluation of knowledge management systems
* Practical experiences ("lessons learned") with IT aided approaches
* Integration of knowledge management and business processes
---+ Important Dates (all times CEST)
* Submission of papers: 15th June 2009
* Notification of acceptance: 13th July 2009
* Camera ready copies due: 27th July 2009
* Workshop: 21st to 23rd September 2009
All papers submitted to the workshop will be reviewed. Submission is
electronic in PDF format via the EasyChair
Submitted papers must conform to the LWA style (see homepage).
Submitted papers should not exceed 8 pages. In case of questions do not
hesitate the organizers.
You may also present work that has already been published elsewhere. In
that case, we ask for a one-page abstract referring to the original
* Christoph Lange [http://kwarc.info/clange/], Jacobs University Bremen
* Jochen Reutelshöfer [http://www.is.informatik.uni-
wuerzburg.de/staff/reutelshoefer_jochen/], University of Würzburg
---+ Programme Committee
* Klaus-Dieter Althoff, Universität Hildesheim
* Ralph Bergmann, Universität Trier
* Rainer Schmidt, Universität Rostock
* Mirjam Minor, Universität Trier
* Markus Nick, empolis GmbH
* Ioannis Iglezakis, Universität von Thessaloniki
* Steffen Staab, Universität Koblenz-Landau
* Joachim Baumeister, Universität Würzburg
* Michael Kohlhase, Jacobs-Universität Bremen
* Ulrich Reimer, University of Applied Sciences St. Gallen
* Thomas Roth-Berghofer, DFKI
Christoph Lange, Jacobs Univ. Bremen, http://kwarc.info/clange, Skype duke4701