Data on editathons held in each Wikipedia Language?

List overview All Threads
Download

newer

older

Call for Papers: Decentralizing...

New editor retention rates Visual...

Maximilian Klein

7 Dec 2015 7 Dec '15

10:34 p.m.

Researchians,

I have a been collecting data on the gendered biographies of different Wikipedia Languages from Wikidata dumps, with the question of trying to understand the gender gap in content. After reading about Propensity Score Matching[1] today, I see it would be possible to test a (close to) causal link between the genders of Wikipedia Biographies being added to a language, and Editathon activity. Yet we'd need the data for editathon activity. Is it compiled somewhere, or can you think of how it could be compiled?

[1] https://en.wikipedia.org/wiki/Propensity_score_matching The idea in propensity score matching is to pretend a randomized experiment is being conducted, and to find a "control group" - a similar but untreated language, for each "treated group".

Make a great day, Max Klein ‽ http://notconfusing.com/

Attachments:

attachment.htm (text/html — 1.1 KB)

Show replies by date

Jonathan Morgan

8 Dec 8 Dec

12:45 p.m.

I don't personally know of any central repository for data on past edit-a-thons.

There might be something out there. You could probably get some information from pinging folks in CE who've worked on Project & Event Grants (Asaf Bartov, Kacie Harold) or Program Evaluation (Amanda Bittaker, Edward Galvez), or search through past grant reports... but I'm guessing the data will be sparse and inconsistent, as it is still collected in a somewhat ad-hoc fashion.

If WMF were to support the development and maintenance of standardized infrastructure for edit-a-thon tracking--something like Harsh Kothari and Jeph Paul's platform for the Indian Wikiwomen edit-a-thons (site http://2015.wikiwomen.in/, code https://github.com/cosmiclattes/wikiwomen/tree/master)--this would be easier. But AFAIK that hasn't happened. If someone takes up that cause I will voice my support.

On Mon, Dec 7, 2015 at 7:34 PM, Maximilian Klein isalix@gmail.com wrote:

...

Researchians,

I have a been collecting data on the gendered biographies of different Wikipedia Languages from Wikidata dumps, with the question of trying to understand the gender gap in content. After reading about Propensity Score Matching[1] today, I see it would be possible to test a (close to) causal link between the genders of Wikipedia Biographies being added to a language, and Editathon activity. Yet we'd need the data for editathon activity. Is it compiled somewhere, or can you think of how it could be compiled?

[1] https://en.wikipedia.org/wiki/Propensity_score_matching The idea in propensity score matching is to pretend a randomized experiment is being conducted, and to find a "control group" - a similar but untreated language, for each "treated group".

Make a great day, Max Klein ‽ http://notconfusing.com/

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)

Ivan Martínez

1:20 p.m.

We developed in Mexico 3 editathons with only women during 2015. I can send you data from Wikimetrics about this.

2015-12-08 11:45 GMT-06:00 Jonathan Morgan jmorgan@wikimedia.org:

...

I don't personally know of any central repository for data on past edit-a-thons.

There might be something out there. You could probably get some information from pinging folks in CE who've worked on Project & Event Grants (Asaf Bartov, Kacie Harold) or Program Evaluation (Amanda Bittaker, Edward Galvez), or search through past grant reports... but I'm guessing the data will be sparse and inconsistent, as it is still collected in a somewhat ad-hoc fashion.

If WMF were to support the development and maintenance of standardized infrastructure for edit-a-thon tracking--something like Harsh Kothari and Jeph Paul's platform for the Indian Wikiwomen edit-a-thons (site http://2015.wikiwomen.in/, code https://github.com/cosmiclattes/wikiwomen/tree/master)--this would be easier. But AFAIK that hasn't happened. If someone takes up that cause I will voice my support.

J

On Mon, Dec 7, 2015 at 7:34 PM, Maximilian Klein isalix@gmail.com wrote:

...
Researchians,

I have a been collecting data on the gendered biographies of different Wikipedia Languages from Wikidata dumps, with the question of trying to understand the gender gap in content. After reading about Propensity Score Matching[1] today, I see it would be possible to test a (close to) causal link between the genders of Wikipedia Biographies being added to a language, and Editathon activity. Yet we'd need the data for editathon activity. Is it compiled somewhere, or can you think of how it could be compiled?

[1] https://en.wikipedia.org/wiki/Propensity_score_matching The idea in propensity score matching is to pretend a randomized experiment is being conducted, and to find a "control group" - a similar but untreated language, for each "treated group".

Make a great day, Max Klein ‽ http://notconfusing.com/

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- *Iván Martínez* *Presidente - Wikimedia México A.C.User:ProtoplasmaKid @protoplasmakid* Hemos creado la más grande colección de conocimiento compartido. Ayuda a proteger a Wikipedia, dona ahora: https://donate.wikimedia.org

Kerry Raymond

6:42 p.m.

New subject: Data on editathons held in each Wikipedia Language?

I would have to say that it might be realistic to gather statistics for a small event held in one location but for any large event, with multiple physical locations and/or online participation, it would be very difficult. Having been involved in supporting real-world events, I am well aware that many people believe the organisers have nothing else to do but gather statistics. In fact, you are running around like a headless chook the whole day because there are so many things to be done to run the event at all and there are usually too few organisers/helpers relative to the number of mostly newbie participants, so statistics gathering is the last thing on your mind.

Also some people are contributing because they are participants of the editathon but there are also contributions (both helpful and unhelpful) from other members of the community who are just reacting in their usual way to Wikipedia contributions and may not regard themselves as part of the editathon and/or may be completely unaware of it.

As a concrete example, Wikibomb 2014 (an editathon aimed at creating articles for Australian female scientists selected by the Australian Academy of Science) had multiple physical sites in different cities plus on-line participants and took place on a single day. (I was at one of the physical locations and too busy to count the participants but around 30 people). We asked that participants add the category

https://en.wikipedia.org/wiki/Category:Wikibomb2014

to their articles (but obviously we cannot be sure if they did, most were new to Wikipedia editing and may not have even understood what we were asking them to do). However, using the category, we do have a set of 118 articles that we know were created as part of the event (although it may be that some were created in advance or after the event but still used the category, but presumably were part of the event in terms of intent) and some may have been deleted subsequently (we had issues with sources being the university or research institute employing the scientist so perhaps questionable as to their independence, plus we had copyvios where bios from university websites were copy-and-pasted etc).

...

From that set of 118 articles, you can probably analyse their edit histories and find the list of contributors in the first day or so, which should pick up most of the event participants (but also some others). You cannot rely on the first edit being the original participant. I often did the first edit to create the article if people were being diverted into Article for Creation (tip: never use AfC at an event, the success of an event needs immediately visible articles at the end of the day which is not possible with AfC), so first edit may be done by experienced editors as a matter of practicality. But with a certain amount of visible inspection, you would probably be able to identify the person who contributed the most article text on that day and that person would probably be a participant for the event. You might be able to automate that.

Kerry

From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Jonathan Morgan Sent: Wednesday, 9 December 2015 3:46 AM To: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Cc: Harsh Gupta gupta.harsh96@gmail.com Subject: Re: [Wiki-research-l] Data on editathons held in each Wikipedia Language?

I don't personally know of any central repository for data on past edit-a-thons.

If WMF were to support the development and maintenance of standardized infrastructure for edit-a-thon tracking--something like Harsh Kothari and Jeph Paul's platform for the Indian Wikiwomen edit-a-thons (site http://2015.wikiwomen.in/ , code https://github.com/cosmiclattes/wikiwomen/tree/master )--this would be easier. But AFAIK that hasn't happened. If someone takes up that cause I will voice my support.

On Mon, Dec 7, 2015 at 7:34 PM, Maximilian Klein <isalix@gmail.com mailto:isalix@gmail.com > wrote:

Researchians,

Make a great day, Max Klein ‽ http://notconfusing.com/

_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org mailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)

Gerard Meijssen

10 Dec 10 Dec

6:46 a.m.

Hoi, Why not register such things in Wikidata? It makes searches easy and obvious. Thanks, Gerard

On 8 December 2015 at 18:45, Jonathan Morgan jmorgan@wikimedia.org wrote:

...

I don't personally know of any central repository for data on past edit-a-thons.

There might be something out there. You could probably get some information from pinging folks in CE who've worked on Project & Event Grants (Asaf Bartov, Kacie Harold) or Program Evaluation (Amanda Bittaker, Edward Galvez), or search through past grant reports... but I'm guessing the data will be sparse and inconsistent, as it is still collected in a somewhat ad-hoc fashion.

If WMF were to support the development and maintenance of standardized infrastructure for edit-a-thon tracking--something like Harsh Kothari and Jeph Paul's platform for the Indian Wikiwomen edit-a-thons (site http://2015.wikiwomen.in/, code https://github.com/cosmiclattes/wikiwomen/tree/master)--this would be easier. But AFAIK that hasn't happened. If someone takes up that cause I will voice my support.

J

On Mon, Dec 7, 2015 at 7:34 PM, Maximilian Klein isalix@gmail.com wrote:

...
Researchians,

I have a been collecting data on the gendered biographies of different Wikipedia Languages from Wikidata dumps, with the question of trying to understand the gender gap in content. After reading about Propensity Score Matching[1] today, I see it would be possible to test a (close to) causal link between the genders of Wikipedia Biographies being added to a language, and Editathon activity. Yet we'd need the data for editathon activity. Is it compiled somewhere, or can you think of how it could be compiled?

[1] https://en.wikipedia.org/wiki/Propensity_score_matching The idea in propensity score matching is to pretend a randomized experiment is being conducted, and to find a "control group" - a similar but untreated language, for each "treated group".

Make a great day, Max Klein ‽ http://notconfusing.com/

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Maximilian Klein

11 Dec 11 Dec

11:38 a.m.

Thanks, Jmo, Tilman, Kerry, and Ivan, I have some good direction now for collecting data.

Ziko, by "gendered biography" my definition (perhaps flawed) is that for the Wikidata item, there exists the properties "instance of <human>" (P31:Q5) and has a "sex or gender" (P21). As for "ediathon", I don't have a clear definition yet.

Make a great day, Max Klein ‽ http://notconfusing.com/

On Thu, Dec 10, 2015 at 5:46 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, Why not register such things in Wikidata? It makes searches easy and obvious. Thanks, Gerard

On 8 December 2015 at 18:45, Jonathan Morgan jmorgan@wikimedia.org wrote:

...
I don't personally know of any central repository for data on past edit-a-thons.

There might be something out there. You could probably get some information from pinging folks in CE who've worked on Project & Event Grants (Asaf Bartov, Kacie Harold) or Program Evaluation (Amanda Bittaker, Edward Galvez), or search through past grant reports... but I'm guessing the data will be sparse and inconsistent, as it is still collected in a somewhat ad-hoc fashion.

If WMF were to support the development and maintenance of standardized infrastructure for edit-a-thon tracking--something like Harsh Kothari and Jeph Paul's platform for the Indian Wikiwomen edit-a-thons (site http://2015.wikiwomen.in/, code https://github.com/cosmiclattes/wikiwomen/tree/master)--this would be easier. But AFAIK that hasn't happened. If someone takes up that cause I will voice my support.

J

On Mon, Dec 7, 2015 at 7:34 PM, Maximilian Klein isalix@gmail.com wrote:

...
Researchians,

I have a been collecting data on the gendered biographies of different Wikipedia Languages from Wikidata dumps, with the question of trying to understand the gender gap in content. After reading about Propensity Score Matching[1] today, I see it would be possible to test a (close to) causal link between the genders of Wikipedia Biographies being added to a language, and Editathon activity. Yet we'd need the data for editathon activity. Is it compiled somewhere, or can you think of how it could be compiled?

[1] https://en.wikipedia.org/wiki/Propensity_score_matching The idea in propensity score matching is to pretend a randomized experiment is being conducted, and to find a "control group" - a similar but untreated language, for each "treated group".

Make a great day, Max Klein ‽ http://notconfusing.com/

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Ziko van Dijk

8 Dec 8 Dec

1:14 p.m.

Hello,

I am not sure whether I understood well your first sentence - what is a "gendered biography" of a Wikipedia language?

You use the term "editaton". What is this exactly? I met at least three different uses: * an informal meeting of Wikipedians, to sit together and work on their laptops, with the goal of producing content * a Wikipedia training course, for non Wikipedians * a mixture of both, or a more formal meeting e.g. of Wikipedians with GLAM people

Maybe you find information about the related work in the chapters via the reports for the Annual Plan Grants.

Kind regards Ziko

2015-12-08 4:34 GMT+01:00 Maximilian Klein isalix@gmail.com:

...

Researchians,

I have a been collecting data on the gendered biographies of different Wikipedia Languages from Wikidata dumps, with the question of trying to understand the gender gap in content. After reading about Propensity Score Matching[1] today, I see it would be possible to test a (close to) causal link between the genders of Wikipedia Biographies being added to a language, and Editathon activity. Yet we'd need the data for editathon activity. Is it compiled somewhere, or can you think of how it could be compiled?

[1] https://en.wikipedia.org/wiki/Propensity_score_matching The idea in propensity score matching is to pretend a randomized experiment is being conducted, and to find a "control group" - a similar but untreated language, for each "treated group".

Make a great day, Max Klein ‽ http://notconfusing.com/

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Tilman Bayer

10 Dec 10 Dec

6:29 a.m.

https://meta.wikimedia.org/wiki/Grants:Evaluation/Evaluation_reports/2015/Ed... may be of interest, which is based on an impressive data collection from 121 different editathons (https://meta.wikimedia.org/wiki/Grants:Evaluation/Evaluation_reports/2015/Ed... ); although it looks like it did not include article lists for each.

On Mon, Dec 7, 2015 at 7:34 PM, Maximilian Klein isalix@gmail.com wrote:

...

Researchians,

I have a been collecting data on the gendered biographies of different Wikipedia Languages from Wikidata dumps, with the question of trying to understand the gender gap in content. After reading about Propensity Score Matching[1] today, I see it would be possible to test a (close to) causal link between the genders of Wikipedia Biographies being added to a language, and Editathon activity. Yet we'd need the data for editathon activity. Is it compiled somewhere, or can you think of how it could be compiled?

[1] https://en.wikipedia.org/wiki/Propensity_score_matching The idea in propensity score matching is to pretend a randomized experiment is being conducted, and to find a "control group" - a similar but untreated language, for each "treated group".

Make a great day, Max Klein ‽ http://notconfusing.com/

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB

3146

Age (days ago)

3149

Last active (days ago)

wiki-research-l@lists.wikimedia.org

7 comments

7 participants

tags (0)

participants (7)

Gerard Meijssen
Ivan Martínez
Jonathan Morgan
Kerry Raymond
Maximilian Klein
Tilman Bayer
Ziko van Dijk