I would have to say that it might be realistic to gather statistics for a small event held in one location but for any large event, with multiple physical locations and/or online participation, it would be very difficult. Having been involved in supporting real-world events, I am well aware that many people believe the organisers have nothing else to do but gather statistics. In fact, you are running around like a headless chook the whole day because there are so many things to be done to run the event at all and there are usually too few organisers/helpers relative to the number of mostly newbie participants, so statistics gathering is the last thing on your mind.

Also some people are contributing because they are participants of the editathon but there are also contributions (both helpful and unhelpful) from other members of the community who are just reacting in their usual way to Wikipedia contributions and may not regard themselves as part of the editathon and/or may be completely unaware of it.

As a concrete example, Wikibomb 2014 (an editathon aimed at creating articles for Australian female scientists selected by the Australian Academy of Science) had multiple physical sites in different cities plus on-line participants and took place on a single day. (I was at one of the physical locations and too busy to count the participants but around 30 people). We asked that participants add the category

https://en.wikipedia.org/wiki/Category:Wikibomb2014

to their articles (but obviously we cannot be sure if they did, most were new to Wikipedia editing and may not have even understood what we were asking them to do). However, using the category, we do have a set of 118 articles that we know were created as part of the event (although it may be that some were created in advance or after the event but still used the category, but presumably were part of the event in terms of intent) and some may have been deleted subsequently (we had issues with sources being the university or research institute employing the scientist so perhaps questionable as to their independence, plus we had copyvios where bios from university websites were copy-and-pasted etc).

From that set of 118 articles, you can probably analyse their edit histories and find the list of contributors in the first day or so, which should pick up most of the event participants (but also some others). You cannot rely on the first edit being the original participant. I often did the first edit to create the article if people were being diverted into Article for Creation (tip: never use AfC at an event, the success of an event needs immediately visible articles at the end of the day which is not possible with AfC), so first edit may be done by experienced editors as a matter of practicality. But with a certain amount of visible inspection, you would probably be able to identify the person who contributed the most article text on that day and that person would probably be a participant for the event. You might be able to automate that.

Kerry

From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Jonathan Morgan
Sent: Wednesday, 9 December 2015 3:46 AM
To: Research into Wikimedia content and communities <wiki-research-l@lists.wikimedia.org>
Cc: Harsh Gupta <gupta.harsh96@gmail.com>
Subject: Re: [Wiki-research-l] Data on editathons held in each Wikipedia Language?

I don't personally know of any central repository for data on past edit-a-thons.

There might be something out there. You could probably get some information from pinging folks in CE who've worked on Project & Event Grants (Asaf Bartov, Kacie Harold) or Program Evaluation (Amanda Bittaker, Edward Galvez), or search through past grant reports... but I'm guessing the data will be sparse and inconsistent, as it is still collected in a somewhat ad-hoc fashion.

If WMF were to support the development and maintenance of standardized infrastructure for edit-a-thon tracking--something like Harsh Kothari and Jeph Paul's platform for the Indian Wikiwomen edit-a-thons (site, code)--this would be easier. But AFAIK that hasn't happened. If someone takes up that cause I will voice my support.

On Mon, Dec 7, 2015 at 7:34 PM, Maximilian Klein <isalix@gmail.com> wrote:

Researchians,
I have a been collecting data on the gendered biographies of different Wikipedia Languages from Wikidata dumps, with the question of trying to understand the gender gap in content. After reading about Propensity Score Matching[1] today, I see it would be possible to test a (close to) causal link between the genders of Wikipedia Biographies being added to a language, and Editathon activity. Yet we'd need the data for editathon activity. Is it compiled somewhere, or can you think of how it could be compiled?

[1] https://en.wikipedia.org/wiki/Propensity_score_matching The idea in propensity score matching is to pretend a randomized experiment is being conducted, and to find a "control group" - a similar but untreated language, for each "treated group".

Make a great day,
Max Klein ‽ http://notconfusing.com/

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Jonathan T. Morgan

Senior Design Researcher

Wikimedia Foundation

User:Jmorgan (WMF)