Wiki-research-l August 2015

wiki-research-l@lists.wikimedia.org

32 participants
31 discussions

by song＠cs.umn.edu

Pursuant to prior discussions about the need for a research policy on Wikipedia, WikiProject Research is drafting a policy regarding the recruitment of Wikipedia users to participate in studies. At this time, we have a proposed policy, and an accompanying group that would facilitate recruitment of subjects in much the same way that the Bot Approvals Group approves bots. The policy proposal can be found at: http://en.wikipedia.org/wiki/Wikipedia:Research The Subject Recruitment Approvals Group mentioned in the proposal is being described at: http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group Before we move forward with seeking approval from the Wikipedia community, we would like additional input about the proposal, and would welcome additional help improving it. Also, please consider participating in WikiProject Research at: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research -- Bryan Song GroupLens Research University of Minnesota

9 months, 2 weeks

Wikipedia aggregate clickstream data released

by Dario Taraborelli

We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770> This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015. This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream> Ellery and Dario

6 years, 3 months

Fwd: [Wikitech-l] statistics about frequent section titles

by Jonathan Morgan

Cross-posting this request to wiki-research-l. Anyone have data on frequently used section titles in articles (any language), or know of datasets/publications that examined this? I'm not aware of any off the top of my head, Amir. - Jonathan ---------- Forwarded message ---------- From: Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il> Date: Sat, Jul 11, 2015 at 3:29 AM Subject: [Wikitech-l] statistics about frequent section titles To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Hi, Did anybody ever try to collect statistics about frequent section titles in Wikimedia projects? For Wikipedia, for example, titles such as "Biography", "Early life", "Bibliography", "External links", "References", "History", etc., appear in a lot of articles, and their counterparts appear in a lot of languages. There are probably similar things in Wikivoyage, Wiktionary and possibly other projects. Did anybody ever try to collect statistics of the most frequent section titles in each language and project? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬ _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

8 years

Editor Activity Analysis & Graphs

by jeph

Hi All, I been working on graphs to visualize the entire edit activity of in wiki for some time now. I'm documenting all of it at https://meta.wikimedia.org/wiki/Research:Editor_Behaviour_Analysis_%26_Grap… . The graphs can be viewed at https://cosmiclattes.github.io/wikigraphs/data/wikis.html. Currently only graphs for 'en' have been put up, I'll add the graphs for the wikis soon. Methodology - The editors are split into groups based on the month in which they made their first edit. - The active edit sessions (value or percentage etc) for the groups are then plotted as stacked bars or as a matrix. I've used the canonical definition of an active edit session. The value are + or - .1% of the values on https://stats.wikimedia.org/ Selector - There is a selector on each graph that lets you filter the data in the graph. On moving the cursor to the left end of the selector you will get a resize cursor. The selection can then are moved or redrawn. - In graphs 1,2 the selector filters by percentage. - In graphs 3,4,5 the selector filters by the age of the cohort. Preliminary Finding - Longevity of editors fell drastically starting Jan 06 and has since stabilized at levels from Jan 07. https://meta.wikimedia.org/wiki/Research:Editor_Behaviour_Analysis_%26_Grap… Would you to hear what you guys think of the graphs & any ideas you would have for me. Jeph

8 years, 6 months

Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

by WereSpielChequers

Hi, With 8% more editors contributing over 100 edits in June 2015 than in June 2014 <https://stats.wikimedia.org/EN/TablesWikipediaEN.htm>, we have now had six consecutive months where this particular metric of the core community is looking positive. One or two months could easily be a statistical blip, especially when you compare calender months that may have 5 weekends in one year and four the next. But 6 months in a row does begin to look like a change in pattern. As far as caveats go I'm aware of several of the reasons why raw edit count is a suspect measure, but I'm not aware of anything that has come in in this year that would have artificially inflated edit counts and brought more of the under 100 editors into the >100 group. I know there was a recent speedup, which should increase subsequent edit rates, and one of the edit filters got disabled in June, but neither of those should be relevant to the Jan-May period. Would anyone on this list be aware of something that would have otherwise thrown that statistic? Otherwise I'm considering submitting something to the Signpost. Regards Jonathan

8 years, 7 months

Guide to starting an open access journal

by Pine W

Since there has been occasional discussion about having a Wikimedia open access journal, I'm sharing this: http://projects.digital-cultures.net/hybrid-publishing-lab/files/2014/07/HO… Pine

8 years, 7 months

Re: [Wiki-research-l] WMF initiative: Community Capacity Development

by James Salsman

That reminds me, did en:User:Dispenser ever get the 20 terabytes he wanted for a reflinks cache? The Foundation should be hosting it, not a volunteer, because the Foundation can easily afford to defend the (what I believe is very clearly) fair use but if a volunteer has to it could ruin multiple years. Refs.: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_12… and https://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/Newsroom/Su…

8 years, 8 months

WMF initiative: Community Capacity Development

by Kerry Raymond

Asaf Bartov has announced the WMF initiative: Community Capacity Development on the Wikimedia-l mailing list. The thread starts here: https://lists.wikimedia.org/pipermail/wikimedia-l/2015-August/078954.html The initiative can be found here: https://meta.wikimedia.org/wiki/Community_Capacity_Development As experimentation is mentioned in parts of this, I have asked a question about the freedom to experiment, the engineering resources to support experiments, and (possibly involving some of you) the support of WMF for the qualitative and quantitative collection and analysis of data arising from such experiments. We speculate a lot on this list about what might make a difference but generally that's all we can do as we have no way to test an idea. So I am genuinely curious to know if the resourcing is there to support experimentation. I also note that the on-wiki pages invite ideas. It occurred to me that there might be scope for re-using ideas that have been put forward on this list. Kerry

8 years, 8 months

Wikidata quality

by Gerard Meijssen

Hoi, There is a lot of knowledge on quality in online databases. It is known that all of them have a certain error rate. This is true for Wikidata as much as any other source. My question is: is there a way to track Wikidata quality improvements over time. One approach I blogged about [1]. It is however only an approach to improve quality not an approach to determine quality and track the improvement of quality. The good news is that there are many dumps of Wikidata so it is possible to compare current Wikidata with how it was in the past. Would this be something that makes sense to get into for Wikimedia research. particularly in the light of Wikidata becoming more easily available to Wikipedia? Thanks, GerardM [1] http://ultimategerardm.blogspot.nl/2015/08/wikidata-quality-probability-and…

8 years, 8 months

Tracking authorship of wiki content

by Luca de Alfaro

Dear All, I was yesterday at OpenSym (many thanks to Dirk for organizing this!), and I was chatting with some people about attribution of content to its authors in a wiki. So I got inspired, and I cleaned up some code that Michael Shavlovsky and I had written for this: https://github.com/lucadealfaro/authorship-tracking The way to use it is super simple (see below). The attribution object can also be serialized and de-serialized to/from json (see documentation on github). The idea behind the code is to attribute the content to the *earliest revision *where the content was inserted, not the latest as diff tools usually do. So if some piece of text is inserted, then deleted, then re-inserted (in a revert or a normal edit), we still attribute it to the earliest revision. This is somewhat similar to what we tried to do in WikiTrust, but it's better done, and far more efficient. The algorithm details can be found in http://www2013.wwwconference.org/proceedings/p343.pdf I hope this might be of interest! Luca import authorship_attribution a = authorship_attribution.AuthorshipAttribution.new_attribution_processor(N=4) a.add_revision("I like to eat pasta".split(), revision_info="rev0") a.add_revision("I like to eat pasta with tomato sauce".split(), revision_info="rev1") a.add_revision("I like to eat rice with tomato sauce".split(), revision_info="rev3")print a.get_attribution() ['rev0', 'rev0', 'rev0', 'rev0', 'rev3', 'rev1', 'rev1', 'rev1']

8 years, 8 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l August 2015