Wiki-research-l February 2018

wiki-research-l@lists.wikimedia.org

26 participants
21 discussions

by song＠cs.umn.edu

Pursuant to prior discussions about the need for a research policy on Wikipedia, WikiProject Research is drafting a policy regarding the recruitment of Wikipedia users to participate in studies. At this time, we have a proposed policy, and an accompanying group that would facilitate recruitment of subjects in much the same way that the Bot Approvals Group approves bots. The policy proposal can be found at: http://en.wikipedia.org/wiki/Wikipedia:Research The Subject Recruitment Approvals Group mentioned in the proposal is being described at: http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group Before we move forward with seeking approval from the Wikipedia community, we would like additional input about the proposal, and would welcome additional help improving it. Also, please consider participating in WikiProject Research at: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research -- Bryan Song GroupLens Research University of Minnesota

9 months, 2 weeks

[Analytics] Beeline as Hive client

by Madhumitha Viswanathan

Hi all, For all Hive users using stat1002/1004, you might have seen a deprecation warning when you launch the hive client - that claims it's being replaced with Beeline. The Beeline shell has always been available to use, but it required supplying a database connection string every time, which was pretty annoying. We now have a wrapper <https://github.com/wikimedia/operations-puppet/blob/production/modules/role…> script setup to make this easier. The old Hive CLI will continue to exist, but we encourage moving over to Beeline. You can use it by logging into the stat1002/1004 boxes as usual, and launching `beeline`. There is some documentation on this here: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline. If you run into any issues using this interface, please ping us on the Analytics list or #wikimedia-analytics or file a bug on Phabricator <http://phabricator.wikimedia.org/tag/analytics>. (If you are wondering stat1004 whaaat - there should be an announcement coming up about it soon!) Best, --Madhu :)

5 years, 6 months

What percentage of digital assistants cite Wikipedia?

by Stella Yu

Curious, what percentage of digital assistants (Alexa, Siri, Cortana, Google) cite Wikipedia when a person asks a question? Does the current Wikipedia mobile app support voice search? Are there any reports on this? Thanks in advance! Sincere regards, Stella -- Stella Yu | STELLARESULTS | 415 690 7827 "Chronicling heritage brands and legendary people."

5 years, 10 months

Wiki Workshop 2018 Announcement and Call for Papers

by Leila Zia

Hi everyone, We are excited to announce that the 5th annual Wiki Workshop [1] will take place in Lyon on April 24, 2018 and as part of The Web Conference 2018 (a.k.a. WWW2018) [2]. You can access the call for papers at http://wikiworkshop.org/2018/#call . Please submit your ongoing or completed research related to Wikimedia projects to the workshop. Note that 2018-01-28 is the submission deadline if you want your paper to appear in the proceedings, and 2018-03-11 is for all other papers.[3] Following the past year's model, the workshop will have a set of invited talks (Jon Kleinberg and Markus Kroetzsch have already accepted our invitation [4] \o/), a poster session, and more. Questions and comments are welcome. Otherwise, we're looking forward to receiving your submissions and seeing you in Lyon in April. :) Best, Leila, on behalf of the organizers [5] [1] http://wikiworkshop.org/2018/ [2] https://www2018.thewebconf.org/ [3] http://wikiworkshop.org/2018/#dates [4] http://wikiworkshop.org/2018/#speakers [5] http://wikiworkshop.org/2018/#organization -- Leila Zia Senior Research Scientist Wikimedia Foundation

6 years

by Ziko van Dijk

Hello, At the moment I am writing about the Wikipedia rules with regard to research. some researchers are interested in Wikipedia talk pages, others want to interview Wikipedians, others again make „experiments“ within the wiki in order to watch Wikipedians‘ reactions. Researchers try to stick to some general ethics such as respecting anonymity and not causing harm. The CC licences do not seem to relate to research; the Terms of Use a little bit. To my knowledge, only Wikipedia in English has some specific lines about research in its set of rules (e.g. with the advice to disclose research interests on a user page). Do you know about research related rules in other language versions? Kind regards, Ziko

6 years, 2 months

Research Showcase Wednesday, February 21, 2018 [External]

by Sarah R

Hi Everyone, The next Research Showcase will be live-streamed this Wednesday, February 21, 2018 at 11:30 AM (PST) 18:30 UTC. YouTube stream: https://www.youtube.com/watch?v=fpmRWCE7F_I As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase>. This month's presentation: *Visual enrichment of collaborative knowledge bases* By Miriam Redi, Wikimedia Foundation Images allow us to explain, enrich and complement knowledge without language barriers [1]. They can help illustrate the content of an item in a language-agnostic way to external data consumers. Images can be extremely helpful in multilingual collaborative knowledge bases such as Wikidata. However, a large proportion of Wikidata items lack images. More than 3.6M Wikidata items are about humans (Q5), but only 17% of them have an image associated with them. Only 2.2M of 40 Million Wikidata items have an image. A wider presence of images in such a rich, cross-lingual repository could enable a more complete representation of human knowledge. In this talk, we will discuss challenges and opportunities faced when using machine learning and computer vision tools for the visual enrichment of collaborative knowledge bases. We will share research to help Wikidata contributors make Wikidata more “visual” by recommending high-quality Commons images to Wikidata items. We will show the first results on free-licence image quality scoring and recommendation and discuss future work in this direction. [1] Van Hook, Steven R. "Modes and models for transcending cultural differences in international classrooms." Journal of Research in International Education 10.1 (2011): 5-27. http://journals.sagepub.com/doi/abs/10.1177/1475240910395788 *Backlogs—backlogs everywhere: Using machine classification to clean up the new page backlog* By Aaron Halfaker, Wikimedia Foundation If there's one insight that I've had about the functioning of Wikipedia and other wiki-based online communities, it's that eventually self-directed work breaks down and some form of organization becomes important for task routing. In Wikipedia specifically, the notion of "backlogs" has become dominant. There's backlogs of articles to create, articles to clean up, articles to assess, new editor contributions to review, manual of style rules to apply, etc. To a community of people working on a backlog, the state of that backlog has deep effects on their emotional well being. A backlog that only grows is frustrating and exhausting. Backlogs aren't inevitable though and there are many shapes that backlogs can take. In my presentation, I'll tell a story about where English Wikipedia editors defined a process and set of roles that formed a backlog around new page creations. I'll make the argument that this formalization of quality control practices has created a choke point and that alternatives exist. Finally I'll present a vision for such an alternative using models that we have developed for ORES, the open machine prediction service my team maintains. -- Sarah R. Rodlund Senior Project Coordinator-Product & Technology, Wikimedia Foundation srodlund(a)wikimedia.org

6 years, 2 months

Gaps

by Heather Ford

Having a look at the new WMF research site, I noticed that it seems that notification and recommendations mechanisms are the key strategy being focused on re. the filling of Wikipedia's content gaps. Having just finished a research project on just this problem and coming to the opposite conclusion i.e. that automated mechanisms were insufficient for solving the gaps problem, I was curious to find out more. This latest research that I was involved in with colleagues was based on an action research project aiming to fill gaps in topics relating to South Africa. The team tried a range of different strategies discussed in the literature for filling Wikipedia's gaps without any wild success. Automated mechanisms that featured missing and incomplete articles catalysed very few edits. When looking for related research, it seemed that others had come to a similar conclusion i.e. that automated notification/recommendations alone didn't lead to improvements in particular target areas. That makes me think that a) I just haven't come across the right research or b) that there are different types of gaps and that those different types require different solutions i.e. the difference between filling gaps across language versions, gaps created by incomplete articles about topics for which there are few online/reliable sources is different from the lack of articles about topics for which there are many online/reliable sources, gaps in articles about particular topics, relating to particular geographic areas etc. Does anyone have any insight here? - either on research that would help practitioners decide how to go about a project of filling gaps in a particular subject area or about whether the key focus of research at the WMF is on filling gaps via automated means such as recommendation and notification mechanisms? Many thanks! Best, Heather.

6 years, 2 months

Research Project on Real-Time Vandal Detection during Editing

by Martin Potthast

Hi everyone, we [1] would like to announce a research project with the goal of studying whether user interactions recorded at the time of editing are suitable to predict vandalism in real time. Should vandal editing behavior be sufficiently different from normal editing behavior, this would allow for a number of interesting real-time prevention techniques. For example: - withholding confidently suspicious edits for review before publishing them, - a popup asking "I am not a vandal" (as in Google's "I am not a robot") to analyze vandal reactions, - a popup with a chat box to personally engage vandals, e.g., to help them find other ways of stress relief or to understand them better, - or at the very least: a new signal to improve traditional vandalism detectors. We have set up a laboratory environment to study editor behavior in a realistic setting using a private mirror of Wikipedia. No editing whatsoever is conducted on the real Wikipedia as part of our experiments, and all test subjects of our user studies are made aware of the experimental nature of their editing. We plan on making use of crowdsourcing as a means to attain scale and diversity. If you wish to participate in this study as a test subject yourself, please get in touch. The more diversity, the more insightful the results will be. We are also happy to collaborate and to answer all questions that may arise in relation to the project. For example, our setup and tooling may turn out to be useful to study other user behavior-related things without having to actually deploy experiments within the live MediaWiki. Best, Martin PS: The AICaptcha project seems most closely related. @Vinitha and Gergő: If you wish, we can set up a Skype meeting to talk about a avenues for collaboration. [1] A group of students and researchers from Bauhaus-Universität Weimar ( www.webis.de) and Leipzig University (www.temir.org); project PI: Martin Potthast.

6 years, 2 months

Hadoop Cluster Maintenance - Now

by Joseph Allemandou

Hi ! The hadoop cluster maintenance (upgrade to Java 8) was planned to happen earlier today but is finally happening now. Il will require a complete shutdown and should not last longer than a couple of hours (expected less than one). Thanks ! Joseph on behalf of the Analytics-Team

6 years, 2 months

[Engineering] Analytics Hadoop cluster maintenance postponed - Tue 13th February

by Joseph Allemandou

Hi Analytics folks, *TL;DR: Hadoop cluster maintenance postponed to Tue 13th February* We've experienced an issue in getting some data onto the cluster this month, making some of our monthly datasets (the ones that depend on that late data) not yet computed. We have decided to postpone the maintenance of the cluster to next week, allowing for those jobs to be finished. We are very sorry about the short notice and will send another email the day before maintenance. Best Joseph Allemandou on behalf of the Analytics-Team Data Engineer @ Wikimedia Foundation IRC: joal

6 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l February 2018