Wiki-research-l May 2017

wiki-research-l@lists.wikimedia.org

28 participants
22 discussions

by song＠cs.umn.edu

Pursuant to prior discussions about the need for a research policy on Wikipedia, WikiProject Research is drafting a policy regarding the recruitment of Wikipedia users to participate in studies. At this time, we have a proposed policy, and an accompanying group that would facilitate recruitment of subjects in much the same way that the Bot Approvals Group approves bots. The policy proposal can be found at: http://en.wikipedia.org/wiki/Wikipedia:Research The Subject Recruitment Approvals Group mentioned in the proposal is being described at: http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group Before we move forward with seeking approval from the Wikipedia community, we would like additional input about the proposal, and would welcome additional help improving it. Also, please consider participating in WikiProject Research at: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research -- Bryan Song GroupLens Research University of Minnesota

9 months, 2 weeks

[Analytics] Beeline as Hive client

by Madhumitha Viswanathan

Hi all, For all Hive users using stat1002/1004, you might have seen a deprecation warning when you launch the hive client - that claims it's being replaced with Beeline. The Beeline shell has always been available to use, but it required supplying a database connection string every time, which was pretty annoying. We now have a wrapper <https://github.com/wikimedia/operations-puppet/blob/production/modules/role…> script setup to make this easier. The old Hive CLI will continue to exist, but we encourage moving over to Beeline. You can use it by logging into the stat1002/1004 boxes as usual, and launching `beeline`. There is some documentation on this here: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline. If you run into any issues using this interface, please ping us on the Analytics list or #wikimedia-analytics or file a bug on Phabricator <http://phabricator.wikimedia.org/tag/analytics>. (If you are wondering stat1004 whaaat - there should be an announcement coming up about it soon!) Best, --Madhu :)

5 years, 6 months

Wikipedia aggregate clickstream data released

by Dario Taraborelli

We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770> This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015. This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream> Ellery and Dario

6 years, 3 months

research trying to influence real-world outcomes by editing Wikipedia

by James Salsman

This was in the recent Research Newsletter: https://www.econstor.eu/bitstream/10419/127472/1/847290360.pdf They found a correlation between the length of articles about tourist destinations and the number of tourists visiting them. They tried to influence other destinations by adding content and did not find a correlation in the subsequent number of tourists, suggesting that the causation flows from tourism to article length instead. But I was taken aback by the last line of their paper, "using the suggested research design to study other areas of information acquisition, such as medicine or school choices could be fruitful directions." Are there any ethical guidelines concerning whether this is reasonable? Should there be?

6 years, 9 months

Wikipedia Detox: Scaling up our understanding of harassment on Wikipedia

by Ellery Wulczyn

Today we are announcing <https://blog.wikimedia.org/2017/02/07/scaling-understanding-of-harassment/> the first results of the collaboration between Wikimedia Research and Jigsaw on modeling personal attacks and other forms of harassment on English Wikipedia. We have released <https://figshare.com/projects/Wikipedia_Talk/16731> a corpus of 95M user and article talk page comments as well as over 1M human labels produced by 4000 crowd-workers for a set of 100k comments. Documentation on our methodology and future work can be found in our paper Ex Machina: Personal Attacks Seen at Scale <https://arxiv.org/abs/1610.08914> (to appear at WWW2017) and on our project page on meta <https://meta.wikimedia.org/wiki/Research:Detox>. If you are interested in contributing to the project, please get in touch via the project talk page <https://meta.wikimedia.org/wiki/Research_talk:Detox>. Another great way to get involved is to label a set of comment in the Wikilabels discussion quality campaign <http://labels.wmflabs.org/ui/enwiki/>.

6 years, 10 months

Video demos of upcoming changes to edit review / RC patrol

by Pine W

I'd like to highlight two videos (some people may have already seen these) that demo upcoming changes to edit review / RC patrol that take advantage of ORES. I feel that that the changes look promising, and I hope that RC patrollers, Teahouse hosts, newbie adopters, and others will find that the changes make their work easier. I also hope for improved retention of good-faith contributors. 0. A succinct overview by Joe Matazzoni (WMF): https://commons.wikimedia.org/w/index.php?title=File%3ANew-feature_demo%E2%… 1. A more extensive overview, also by Joe, including valuable context, from the WMF Metrics Meeting for May 2017: https://www.youtube.com/watch?v=rAGwQdLyFb4 between 15:00 and 28:15. Pine

6 years, 10 months

Join my Reddit AMA about Wikipedia and ethical, transparent AI

by Aaron Halfaker

Hey everybody, TL;DR: I wanted to let you know about an upcoming experimental Reddit AMA ("ask me anything") chat we have planned. It will focus on artificial intelligence on Wikipedia and how we're working to counteract vandalism while also making life better for newcomers. We plan to hold this chat on June 1st at 21:00 UTC/14:00 PST in the /r/iAMA subreddit[1]. I'd love to answer any questions you have about these topics questions, and I'll send a follow-up email to this thread shortly before the AMA begins. ---- For those who don't know who I am, I create artificial intelligences[2] that support the volunteers who edit Wikipedia[3]. I've been fascinated by the ways that crowds of volunteers build massive, high quality information resources like Wikipedia for over ten years. For more background, I research and then design technologies that make it easier to spot vandalism in Wikipedia—which helps support the hundreds of thousands of editors who make productive contributions. I also think a lot about the dynamics between communities and new users—and ways to make communities inviting and welcoming to both long-time community members and newcomers who may not be aware of community norms. For a quick sampling of my work, check out my most impactful research paper about Wikipedia[3], some recent coverage of my work from *Wired*[4], or check out the master list of my projects on my WMF staff user page[5], the documentation for the technology team I run[9], or the home page for Wikimedia Research[8]. This AMA, which I'm doing with with the Foundation's Communications department, is somewhat of an experiment. The intended audience for this chat is people who might not currently be a part of our community but have questions about the way we work—as well as potential research collaborators who might want to work with our data or tools. Many may be familiar with Wikipedia but not the work we do as a community behind the scenes. I'll be talking about the work I'm doing with the ethics of AI and how we think about artificial intelligence on Wikipedia, and ways we’re working to counteract vandalism on the world’s largest crowdsourced source of knowledge—like the ORES extension[6], which you may have seen highlighting possibly problematic edits on your watchlist and in RecentChanges. I’d love for you to join this chat and ask questions. If you do not or prefer not to use Reddit, we will also be taking questions on ORES' MediaWiki talk page[7] and posting answers to both threads. 1. https://www.reddit.com/r/IAmA/ 2. https://en.wikipedia.org/wiki/Artificial_intelligence 2. https://www.mediawiki.org/wiki/ORES 3. http://www-users.cs.umn.edu/~halfak/publications/The_Rise_and_Decline/halfa… 4. https://www.wired.com/2015/12/wikipedia-is-using-ai-to-expand-the-ranks-of-… 5. https://en.wikipedia.org/wiki/User:Halfak_(WMF) 6. https://www.mediawiki.org/wiki/Extension:ORES 7. https://www.mediawiki.org/wiki/Talk:ORES 8. https://www.mediawiki.org/wiki/Wikimedia_Research 9. https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team -Aaron Principal Research Scientist @ WMF User:EpochFail / User:Halfak (WMF)

6 years, 10 months

Research into Requests for Comments and the closing process

by Amy Zhang

Hi all, We are preparing to conduct some research into the process of how Requests for Comments (RfCs) get discussed and closed. This work is further described in the following Wikimedia page: https://meta.wikimedia.o rg/wiki/Research:Discussion_summarization_and_decision_support_with_Wikum To begin, we are planning to do a round of interviews with people who participate in RfCs in English Wikipedia, including frequent closers, infrequent closers, and people who participate in but don't close RfCs. We will be asking them about how they go about closing RfCs and their opinions on how the overall process could be improved. We are also creating a database of all the RfCs on English Wikipedia that have gone through a formal closure process and parsing their conversations. While planning the interviews, we thought that the information that we gather could be of interest to the Wikimedia community, so we wanted to open it up and ask if there was anything you would be interested in learning about RfCs or RfC closure from people who participate in them. Also, if you know of existing work in this area, please let us know. Thank you! Amy -- Amy X. Zhang | Ph.D. student at MIT CSAIL | http://people.csail.mit.edu/axz | @amyxzh

6 years, 10 months

Report/Reflection on CHI 2017

by Andrew Hall

Hello all, I recently attended the 2017 Conference on Human Factors in Computing Systems (CHI) and put together a small report/reflection for Aaron Halfaker regarding some of the work that was presented there that I found interesting. If you’d like to check the report out, it can be found here: https://meta.wikimedia.org/wiki/User:Hall1467/CHI_2017_Report <https://meta.wikimedia.org/wiki/User:Hall1467/CHI_2017_Report>. CHI is a yearly human-computer interaction conference and is a common venue for studies on peer production communities such as Wikipedia. Feel free to leave questions or comments in the talk page! Have a great rest of the week. Andrew

6 years, 10 months

Call for Submissions to the Doctoral Programme - 10th Conference on Intelligent Computer Mathematics - CICM 2017 - Deadline: June 5th, 2017

by serge.autexier＠dfki.de

Call for Submissions to the Doctoral Programme 10th Conference on Intelligent Computer Mathematics - CICM 2017 - July 17-21, 2017 University of Edinburgh, Scotland http://www.cicm-conference.org/2017 ---------------------------------------------------------------------- * Submission deadline (Abstract + CV): 5. June 2017 * Notification of acceptance: 8. June 2017 Further information see below NEWS: Invited Speakers at CICM 2017 - Alan Bundy (University of Edinburgh) - Przemysław Chojecki (Polish Academy of Sciences) - Grant Olney Passmore (University of Cambridge) ---------------------------------------------------------------------- Digital and computational solutions are becoming the prevalent means for the generation, communication, processing, storage and curation of mathematical information. Separate communities have developed to investigate and build computer based systems for computer algebra, automated deduction, and mathematical publishing as well as novel user interfaces. While all of these systems excel in their own right, their integration can lead to synergies offering significant added value. The Conference on Intelligent Computer Mathematics (CICM) offers a venue for discussing and developing solutions to the great challenges posed by the integration of these diverse areas. CICM has been held annually as a joint meeting since 2008, co-locating related conferences and workshops to advance work in these subjects. Previous meetings have been held in Birmingham (UK 2008), Grand Bend (Canada 2009), Paris (France 2010), Bertinoro (Italy 2011), Bremen (Germany 2012), Bath (UK 2013), Coimbra (Portugal 2014), Washington DC (USA 2015) and Bialystok (Poland 2016). This is a call for papers for CICM 2017, which will be held in Edinburgh, Scotland, July 17-21, 2017. CICM 2017 also invites work-in- progress papers. The principal tracks of the conference are: * Track: Calculemus (chair: Matthew England) * Track: Digital Mathematical Libraries (DML) (chair: Olaf Teschke) * Track: Mathematical Knowledge Management (MKM) (chair: Florian Rabe) * Track: Systems & Data (chair: Osman Hasan) * Track: Doctoral Programme (chair: Adnan Rashid) The overall programme is organized by the General Program Chair Herman Geuvers. The local arrangements are coordinated by Jacques Fleuriot. The publicity chair is Serge Autexier. CICM is also an excellent opportunity for graduate students to meet established researchers from the areas of computer algebra, automated deduction, and mathematical publishing. The Doctoral Programme provides a dedicated forum for PhD students to present and discuss their ideas, ongoing or planned research, and achieved results in an open atmosphere. It will consist of presentations by the PhD students to get constructive feedback, advice, and suggestions from the research advisory board, researchers, and other PhD students. Each PhD student will be assigned to an experienced researcher from the research advisory board who will act as a mentor and who will provide detailed feedback and advice on their intended and ongoing research. Submission to the doctoral programs are possible until 5. June, details of the submission process are given below. *Important Dates* - Submission deadline (Abstract + CV): 5. June 2017 - Notification of acceptance: 8. June 2017 *Submission* - Submission by e-mail to the DP chair Adnan Rashid Email: adnan.rashid(a)seecs.edu.pk Web: http://save.seecs.nust.edu.pk/adnanrashid/ More details on the conference are available from http://www.cicm-conference.org/2017

6 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l May 2017