Wiki-research-l July 2018

wiki-research-l@lists.wikimedia.org

24 participants
18 discussions

Re: [Wiki-research-l] Country (culture...) as a factor in contributing to collective intelligence projects
by Lucie-Aimée Kaffee 24 Jul '18

24 Jul '18

Hi Piotr, I would look into things such as distribution (is there one region of the world Wikipedia is used more in general) and alternative projects (such as Chinese Baidu) that might be more popular for people speaking the language. And there might be some aspect to people living abroad editing their language Wikipedia, but that's just speculating. Somewhat along the lines, if people from language one move to places where language two is spoken, and language two has a big Wikipedia already, it might be a motivating factor to edit the other language more as well. Best, Lucie On 24 July 2018 at 09:02, Piotr Konieczny <piokon(a)post.pl> wrote: > Dear all, > > I am working on a paper on why/whether people contribute (or not) to > collective intelligence differently projects in different countries. The > paper was inspired, partially, by several discussions I had with various > people on why different language Wikipedia's have different sizes, > besides (doh) the popularity of the language (and yes, English is > biggest because it is international; and yes, I am aware a few > Wikipedias are outliers because of bots creating machine translations or > auto-populating villages or such). But for example, Poland and South > Korea have roughly similar population/speakers and development status, > yet Polish Wikipedia is over 3x the size of the SK one and no bot can > account for that. So, there's more to that. I am already feeding dozens > of parameters to a spreadsheet for some modelling, but I a) wonder what > I might have missed - before a reviewer asks 'why didn't you check for > xyz' and b) would like to have a few nice sentences about how things > that people expect to matter do not (or vice versa). Hence, my question > to you all, in the form of this open question mini survey: > > Why do you think different language Wikipedia's have different sizes, > outside of the popularity of a given language? > > For reference, list of Wikipedias by size and language: > https://meta.wikimedia.org/wiki/List_of_Wikipedias > > TIA! > > -- > Piotr Konieczny, PhD > http://hanyang.academia.edu/PiotrKonieczny > http://scholar.google.com/citations?user=gdV8_AEAAAAJ > http://en.wikipedia.org/wiki/User:Piotrus > > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > -- Lucie-Aimée Kaffee Web and Internet Science Group School of Electronics and Computer Science University of Southampton

1 0

Fwd: [Wikitech-l] hewiki dump to be added to 'big wikis' and run with multiple processes
by Pine W 24 Jul '18

24 Jul '18

Forwarding in case this is of interest to anyone on the Analytics or Research lists who doesn't subscribe to Wikitech-l or Xmldatadumps-l. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) ---------- Forwarded message ---------- From: Ariel Glenn WMF <ariel(a)wikimedia.org> Date: Fri, Jul 20, 2018 at 5:53 AM Subject: [Wikitech-l] hewiki dump to be added to 'big wikis' and run with multiple processes To: Wikipedia Xmldatadumps-l <Xmldatadumps-l(a)lists.wikimedia.org>, Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Good morning! The pages-meta-history dumps for hewiki take 70 hours these days, the longest of any wiki not already running with parallel jobs. I plan to add it to the list of 'big wikis' starting August 1st, meaning that 6 jobs will run in parallel producing the usual numbered file output; look at e.g. frwiki dumps for an example. Please adjust any download/processing scripts accordingly. Thanks! Ariel _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2 1

Upcoming research newsletter: new papers open for review
by Mohammed Sadat Abdulai 20 Jul '18

20 Jul '18

Hi everyone, We’re preparing for the July 2018 research newsletter and looking for contributors. Please take a look at https://etherpad.wikimedia.org/p/WRN201807 and add your name next to any paper you are interested in covering. Our target publication date is on July 27 UTC although actual publication might happen several days later. As usual, short notes and one-paragraph reviews are most welcome. Highlights from this month: - Computing controversy: Formal model and algorithms for detecting controversy on Wikipedia and in search queries - Digitale Methoden und Werkzeuge für Diskursanalysen am Beispiel der Wikipedia - Evaluating lexical coverage in Simple English Wikipedia articles: a corpus-driven study - Generating Wikipedia by Summarizing Long Sequences - IMGPEDIA: A large-scale knowledge-base to perform visuo-semantic queries over Wikimedia Commons images - Interactions and influence of world painters from the reduced Google matrix of Wikipedia networks - Locating foci of translation on Wikipedia - Modeling Deliberative Argumentation Strategies on Wikipedia - Representing Metro Manila on Wikipedia - Simulation Experiments on (the Absence of) Ratings Bias in Reputation Systems - The Impact of Topic Characteristics and Threat on Willingness to Engage with Wikipedia Articles: Insights from Laboratory Experiments - The_Tower_of_Babel.jpg: Diversity of Visual Encyclopedic Knowledge Across Wikipedia Language Editions - Time-focused analysis of connectivity and popularity of historical persons in Wikipedia - Traitors, Collaborators and Deserters in Contemporary European Politics of Memory - Translation and the Production of Knowledge in Wikipedia: Chronicling the Assassination of Boris Nemtsov - Using wikis in the higher education: The case of Wikipedia - Vandalism on Collaborative Web Communities: An Exploration of Editorial Behaviour in Wikipedia - What is the Commons Worth? Estimating the Value of Wikimedia Imagery by Observing Downstream Use - What leads Ukranian University students to use Wikipedia? - Why do people search Wikipedia for information on multiple sclerosis? Masssly, Tilman Bayer and Dario Taraborelli [1] http://meta.wikimedia.org/ wiki/Research:Newsletter

1 0

Wikimedia Commons data structure - public?
by Trilce Navarrete 19 Jul '18

19 Jul '18

Dear all, I am wondering if the Wikimedia Commons data structure (ideally in XML) as well as the documentation thereof and sample data is something that one could find online. There is a team at ICS FORTH who have developed a mapping technology called X3ML which allows declarative mappings between two data structures. The idea would be to map the Wikimedia Commons data structure to the CIDOC CRM, meant for heritage content users. Where could I try to find the Wikimeida Commons data structure? or who may I ask further on this matter? thank you much in advance for any tips ! best Trilce -- :..::...::..::...::..: Trilce Navarrete m: +31 (0)6 244 84998 | s: trilcen | t: @trilcenavarrete w: trilcenavarrete.com

3 3

The June 2018 issue of the Wikimedia Research Newsletter is out:
by Mohammed Sadat Abdulai 15 Jul '18

15 Jul '18

The June 2018 issue of the Wikimedia Research Newsletter is out: https://blog.wikimedia.org/2018/07/14/research-newsletter-june-2018/ https://meta.wikimedia.org/wiki/Research:Newsletter/2018/June In this issue: 1 "On the Self-similarity of Wikipedia Talks: a Combined Discourse-analytical and Quantitative Approach" 2 "How Sudden Censorship Can Increase Access to Information" 3 Marketing, social media, and Wikipedia 4 Conferences and events 4.1 WMF research showcase 4.2 "Conversations Gone Awry: Detecting Early Signs of Conversational Failure" 4.3 Case studies in the appropriation of ORES *** 12 recent publications were covered or listed in this issue *** Masssly, Tilman Bayer and Dario Taraborelli --- Wikimedia Research Newsletter https://meta.wikimedia.org/wiki/Research:Newsletter/ * Follow us on Twitter: @WikiResearch * Like us on Facebook: Facebook.com/WikiResearch/ * Receive this newsletter by mail: https://lists.wikimedia.org/mailman/listinfo/research-newsletter * Subscribe to the RSS feed: http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed

1 0

Re: [Wiki-research-l] Kafka Main Eqiad outage and failover of Eventbus/Eventstreams to codfw
by Luca Toscano 12 Jul '18

12 Jul '18

[Adding some other mailing lists in Cc] Hi everybody, as a lot of you have probably already noticed yesterday reading the operations@ mailing list, we had an outage of the Kafka Main eqiad cluster that forced us to switch the Eventbus and Eventstreams services to codfw. All the precise timings will be listed in https://wikitech.wikimedia.org/wiki/Incident_documentation/20180711-kafka-e…, but for a quick glimpse: 2018-07-11 17:00 UTC - Eventbus service switched to codfw 2018-07-11 18:44 UTC - Eventstreams service switched to codfw We are going to switch back those services to eqiad during the next couple of hours. The consumers of the Eventstreams service may get some failures or data drops, apologies in advance for the trouble. Cheers, Luca Il giorno gio 12 lug 2018 alle ore 00:00 Luca Toscano < ltoscano(a)wikimedia.org> ha scritto: > Hi everybody, > > as you might have seen from the operations' channel on IRC the Kafka Main > Eqiad cluster (kafka100[1-3].eqiad.wmnet) suffered a long outage due to new > topics pushed out with too long names (causing fs operation issues, etc..). > I'll update this email thread tomorrow EU time with more details, tasks, > precise root cause, etc.., but the important bit to know is that Eventbus > and Eventstreams have been failed over to the Kafka Main Codfw cluster. > This should be transparent to everybody but please let us know otherwise. > > Thanks for the patience! > > (a very sleepy :) Luca > >

1 0

Wikimedia Research Showcase July 11, 2018 (11:30 AM PDT| 18:30 UTC)
by Sarah R 11 Jul '18

11 Jul '18

Hi Everyone, The next Wikimedia Research Showcase will be live-streamed Wednesday, July 11, 2018 at 11:30 AM (PDT) 18:30 UTC. YouTube stream: https://www.youtube.com/watch?v=uK7AvNKq0sg As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here. <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase> Hope to see you there! This month's presentations: Mind the (Language) Gap: Neural Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholdersBy *Lucie-Aimée Kaffee*While Wikipedia exists in 287 languages, its content is unevenly distributed among them. It is therefore of the utmost social and cultural interests to address languages for which native speakers have only access to an impoverished Wikipedia. In this work, we investigate the generation of summaries for Wikipedia articles in underserved languages, given structured data as an input. In order to address the information bias towards widely spoken languages, we focus on an important support for such summaries: ArticlePlaceholders, which are dynamically generated content pages in underserved Wikipedia versions. They enable native speakers to access existing information in Wikidata, a structured Knowledge Base (KB). Our system provides a generative neural network architecture, which processes the triples of the KB as they are dynamically provided by the ArticlePlaceholder, and generate a comprehensible textual summary. This data-driven approach is tested with the goal of understanding how well it matches the communities' needs on two underserved languages on the Web: Arabic, a language with a big community with disproportionate access to knowledge online, and Esperanto. With the help of the Arabic and Esperanto Wikipedians, we conduct an extended evaluation which exhibits not only the quality of the generated text but also the applicability of our end-system to any underserved Wikipedia version. Token-level change tracking: data, tools and insightsBy *Fabian Flöck*This talk first gives an overview of the WikiWho infrastructure, which provides tracking of changes to single tokens (~words) in articles of different Wikipedia language versions. It exposes APIs for accessing this data in near-real time, and is complemented by a published static dataset. Several insights are presented regarding provenance, partial reverts, token-level conflict and other metrics that only become available with such data. Lastly, the talk will cover several tools and scripts that are already using the API and will discuss their application scenarios, such as investigation of authorship, conflicted content and editor productivity.

1 1

Inspire Campaign on Measuring Community Health starts today!
by Sydney Poore 10 Jul '18

10 Jul '18

Hello Wikimedians, I'm happy to announce the launch of the Inspire Campaign on Measuring Community Health.[1] The goal of this campaign is to gather your ideas on approaches to measure or evaluate the experience and quality of participating and interacting with others in Wikimedia projects. So what is community health? Healthy projects promote high quality content creation, respectful collaboration, efficient workflows, and effective conflict resolution. Tasks and experiences that result in patterns of editor frustration, poor editor retention, harassment, broken workflows, and unresolved conflicts are unhealthy for a project. As a movement, Wikimedians have always measured aspects of their communities. Data points, such as editor activity levels, are regularly collected. While these metrics provide some useful indications about the health of a project, they do not give major insights into challenges and specific areas needing improvement or what areas have been successful. We want to hear from you what specific areas on your Wikimedia project should be evaluated or measured, and how it should be done. Share your ideas, contribute to other people’s submissions, and get involved in the new Inspire Campaign. After the campaign, grants and other paths are available to support the formal development of these measures and evaluation techniques.[2] Warm regards, Sydney [1] https://meta.wikimedia.org/wiki/Special:MyLanguage/Grants:IdeaLab/Inspire [2] https://meta.wikimedia.org/wiki/Grants:IdeaLab/Develop-- Sydney Poore Sydney Poore Trust and Safety Specialist Wikimedia Foundation Trust and Safety team; Anti-harassment tools team

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l July 2018