Wiki-research-l October 2016

wiki-research-l@lists.wikimedia.org

27 participants
21 discussions

Research Showcase October 19, 2016
by Sarah R 20 Oct '16

20 Oct '16

Hi Everyone, The next Research Showcase will be live-streamed this Wednesday, October 19, 2016 at 11:30 AM (PST) 18:30 (UTC). Link for remote presenters to join the Hangout on Air: As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#October_2016>. YouTube stream: https://www.youtube.com/watch?v=cBImUZ_si5s This month's showcase includes. Human centered design for using and editing structured data in Wikipedia infoboxesBy *Charlie Kritschmar <https://www.mediawiki.org/wiki/User:Charlie_Kritschmar_(WMDE)> UX Intern, Wikimedia Deutschland <https://meta.wikimedia.org/wiki/Wikimedia_Deutschland>*Wikidata is a Wikimedia project which stores structured data to be used by other Wikimedia projects like Wikipedia. Currently, integrating its data in Wikipedia is difficult for users, since there’s no predefined way to do so and requires some technical knowledge. To tackle these issues, human-centered design methods were applied to find needs from which solutions were generated and evaluated with the help of the community. The concept may serve as a basis which may be implemented into various Wiki projects in the future to make editing Wikidata from within another Wikimedia project more user-friendly and improve the project’s acceptance in the community. Emergent Work in WikipediaBy *Ofer Arazy <http://oferarazy.com/> (University of Haifa)*Online production communities present an exciting opportunity for investigating novel organizational forms. Extant theoretical accounts of knowledge co-production point to organizational policies, norms, and communication as key mechanisms enabling the coordination of work. Yet, in practice participants in initiatives such as Wikipedia are often occasional contributors who are unaware of community policies and do not communicate with other members. How then is work coordinated and how does the organization maintain stability in the face of dynamics in individuals’ task enactment? In this study we develop a conceptualization of emergent roles - the prototypical activity patterns that organically emerge from individuals’ spontaneous actions – and investigate the temporal dynamics of emergent role behaviors. Conducing a multi-level large-scale empirical study stretching over a decade, we tracked co-production of a thousand Wikipedia articles, logging two hundred thousand distinct participants and seven hundred thousand co-production activities. Using a combination of manual tagging and machine learning, we annotated each activity type, and then clustered participants’ activity profiles to arrive at seven prototypical emergent roles. Our analysis shows that participants’ behavior is turbulent, with substantial flow in and out of co-production work and across roles. Our findings at the organizational level, however, show that work is organized around a highly stable set of emergent roles, despite the absence of traditional stabilizing mechanisms such as pre-defined work procedures or role expectations. We conceptualize this dualism in emergent work as “Turbulent Stability”. Further analyses suggest that co-production is artifact-centric, where contributors mutually adjust according to the artifact’s changing needs. Our study advances the theoretical understandings of self-organizing knowledge co-production and particularly the nature of emergent roles. Hope to see you there! Sarah R. Rodlund Senior Project Coordinator-Engineering, Wikimedia Foundation srodlund(a)wikimedia.org

2 1

Wiki-editors' activity
by Alex Yarovoy 18 Oct '16

18 Oct '16

Hi, Does Wikipedia stored any metadata, logs or anything useful to track ones activity? For instance, visited web pages, grants, additional user info etc. (anything beyond the known text edits and page talks). We are trying to capture all available piece of information regarding one's behavior in Wikipedia. Any ideas? Thanks, Alex

2 1

Upcoming research newsletter (September 2016): new papers open for review
by masssly＠ymail.com 15 Oct '16

15 Oct '16

Hi everybody, We’re preparing for the September 2016 research newsletter and looking for contributors. Please take a look at: https://etherpad.wikimedia.org/p/WRN201609 and add your name next to any paper you are interested in covering. The publication schedule is a bit mixed up currently - there is a chance we will already need to get out this issue in the next few days; but if you prefer to take more time, feel free to mark your contribution for the subsequent October issue instead, which should come out toward the end of this month. As usual, short notes and one-paragraph reviews are most welcome. Highlights from this month: • 5000 people on Brexit & US Elections • A Smooth Transition to Modern mathoid-based Math Rendering in Wikipedia with Automatic Visual Regression Testing • Answering End-User Questions, Queries and Searches on Wikipedia and its History • Automated News Suggestions for Populating Wikipedia Entity Page • Content Disputes in Wikipedia Reflect Geopolitical Instability • Creating Causal Embeddings for Question Answering with Minimal Supervision • Cultural Differences in the Understanding of History on Wikipedia • Examining potential mechanisms underlying the Wikipedia gender gap through a collaborative editing task • Expanding Wikidata's Parenthood Information by 178%, or How To Mine Relation Cardinalities • Exploration on the Use of WDQS: Breakdown by Geography, User Agent and Referer Class • Finding News Citations For Wikipedia • Gender gap on Wikipedia: visible in all categories? • How do students trust Wikipedia? An examination across genders • Incorporating Relation Paths in Neural Relation Extraction • Memory Remains: Understanding Collective Memory in the Digital Age • Once You Step Over the First Line, You Become Sensitized to the Next: Towards a Gateway Theory of Online Participation • Privacy, Anonymity, and Perceived Risk in Open Collaboration: A Study of Tor Users and Wikipedians • Quality and Importance of Wikipedia Articles in Different Languages • Using Semantic Web Technologies for Explaining and Predicting Abnormal Expenses • Veni, Vidi, Vicipaedia: Using the Latin Wikipedia in an Advanced Latin Classroom • WikInfoboxer: A Tool to Create Wikipedia Infoboxes Using Dbpedia • Wikipedia and participatory culture: Why fans edit • Writing for Wikipedia in the classroom: challenging official knowledge (a case study in 12th grade) If you have any question about the format or process feel free to get in touch off-list. Masssly, Tilman Bayer and Dario Taraborelli [1] http://meta.wikimedia.org/wiki/Research:Newsletter

3 2

Feature Requests: Suggeted new features
by Aaron Gray 13 Oct '16

13 Oct '16

Dear Wikipedia and MediaWiki people, Hers are some suggested ideas that may allow Wikipedia and MediaWiki to be organized better in the future and for the future of organizing the worlds open public information. *Summaries - popup summaries for pages* Using automated generation of content for the title attribute on the <a> tag containing a summary containing either the content from an <article><header><section id="summary"> or a designated section from Wikimedia markdown a popup summary could be generated for quick browsing for definition of terms on hyperlinks. This would vastly aid the user experience. *Categories - bread crumb like hierarchical and cross referencing categorization and navigation* By creating a set of categorical navigation pages the whole of fields of knowledge on Wikipedia could be categorized. By having a set of clickable list of hierarchical categories displayed like 'bread crumb' navigation lists under the page title the user could quickly navigate this hierarchy. By adding pop up menus to the separating chevrons with each subcategories elements cross category navigation would be made possible. Double clicking on chevrons should navigate to the categorical navigation page. *QuickLink - Quick Link Creation* A hotkey and JavaScript script could allow the creation of links from a selected highlighted bit of normal text to lookup a term, display its summary and allow the user to confirm the generation of a new hyperlink very quickly without having to edit markdown. *Move towards semantic content* By using new HTML elements like <article>, <section> and <header> and id's and classes more of a semantic mapping of content may be established. Tis maybe done incrementally and also for example by a bot auto generating new summary information that maybe verified by either users or editors for publishing. *API* API's from summaries, categories, and semantic content should be made available. More to come ... Regards, Aaron Gray

2 4

Identifying bots and bot edit decline
by Flöck, Fabian 12 Oct '16

12 Oct '16

Hi all , two questions, maybe someone can help: 1. I was trying to compile a complete list of all bots that were ever (potentially) active on the English Wikipedia so that one can identify bot accounts in the dumps. Below are all the lists (including historic bots) that I could find [1]. Out of those overlapping lists, I extracted 2795 unique bot names (some seem to be just names for bot approval request pages). Going through the historic edit data (no current redirects), 1377 user names were actually in that list. Does anyone know if that should cover (almost) all ever active bots, or is there even a better list/method? I would like to avoid using unreliable regular expressions. (Similar question for other language editions) 2. I counted bot edits per half year in en.wikipedia and saw a major decrease between in the first half of 2013 from ~ 3 M to ~1M edits per half year between January and July 2013, which seems to be in line with official stats [2]. This is likely not news, so can someone enlighten me regarding what brought about that sharp decline of bot edits? Cheers, Fabian [1] https://en.wikipedia.org/wiki/Wikipedia:List_of_bots_by_number_of_edits https://en.wikipedia.org/wiki/Wikipedia:Bots/Status/inactive_bots_1 https://en.wikipedia.org/wiki/Wikipedia:Bots/Status/inactive_bots_2 https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_ed… https://en.wikipedia.org/w/api.php?action=query&list=allusers&augroup=bot https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitl… https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/Approved (+ contents of all archive pages) https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#bots [2] https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editor_activity_levels — Dr. Fabian Flöck Researcher Computational Social Science department GESIS - Leibniz Institute for the Social Sciences Unter Sachsenhausen 6-8, 50667 Cologne, Germany Tel: + 49 (0) 221-47694-208 fabian.floeck(a)gesis.org www.gesis.org www.facebook.com/gesis.org

6 6

Monthly (predicted) article quality dataset
by Aaron Halfaker 12 Oct '16

12 Oct '16

Hey folks, I just finished working with Amir[1,2] and building off of some of Morten's work[3] to put together something that I think you're going to like. Halfaker, Aaron (2016): Monthly Wikipedia article quality predictions. > figshare. > https://dx.doi.org/10.6084/m9.figshare.3859800 > Retrieved: 00 56, Oct 12, 2016 (GMT) This dataset contains a row for every article-month since 20010101. Each row has an article quality prediction based on text-only machine classifier (from [3] with slight improvement) and hosted by ORES[4]. We've managed to build models for English, French, and Russian Wikipedia, so I've generated datasets for each of those wikis. It's current as of 2016-08-01 and I plan to run updates periodically. Here are the columns: - page_id -- The page identifier - page_title -- The title of the article (UTF-8_with_underscores) - rev_id -- The most recent revision ID at the time of assessment - timestamp -- The timestamp when the assessment was taken (YYYYMMDDHHMMSS) - prediction -- The predicted quality class ("Stub", "Start", "C", "B", "GA", "FA", ...) - weighted_sum -- The sum of prediction weights assuming indexed class ordering ("Stub" = 0, "Start" = 1, ...) I'll update the docs based on your questions :) 1. https://phabricator.wikimedia.org/p/Ladsgroup/ 2. https://github.com/Ladsgroup 3. http://www-users.cs.umn.edu/~morten/publications/ wikisym2013-tellmemore.pdf 4. https://ores.wikimedia.org/ -Aaron

1 1

[Round 2 Proposal] Arc.heolo.gy, a 2- and 3d visualization and parsing library for Wikipedia
by Ian Seyer 11 Oct '16

11 Oct '16

Hi there, I would like this to be a place to discuss the grant proposed here: https://meta.wikimedia.org/wiki/Grants:Project/Arc.heolo.gy The goal of the project is to provide a powerful semantic library to analyze and visualize relationships that might otherwise be hidden within the immense amount of knowledge contained in Wikipedia. Any feedback or critiques based on any aspect of the project including tech stack, methodology, community engagement practices, or goals, would be HUGELY appreciated. We are also looking for volunteers who are interested in devops, graph technology, NLP (word2vec or otherwise), or data visualization! Thank you for your time, Ian Seyer -- ╭╮ ╭╮┃┃ ╭╮ ╭╮┃┃┃┃╭╮ ┃┃ ╭╮ ┃╰╯╰╯┃┃╰ ╭╮┃┃╭╮┃┃╭╮┃ ╰╯ ╭╮ ┃┃┃┃┃╰╯┃┃╰╯ ┃┃╭╮┃╰╯┃┃ ╰╯ ╮┃╰╯┃┃ ╰╯ ╰╯ ┃┃ ╰╯

2 1

Dutch research paper on the gendergap
by Jane Darnell 09 Oct '16

09 Oct '16

Interesting thesis on the gendergap in English Wikipedia by a student at the University of Amsterdam: http://www.scriptiesonline.uba.uva.nl/document/642528

2 1

Re: [Wiki-research-l] Feature Requests: Suggeted new features - PDF Scraping and creating living working documents
by Aaron Gray 09 Oct '16

09 Oct '16

*Wikimedia Documents - PDF Scraping and creating living working documents* Wikimedia should have a way of creating documents that can easily be imported and exported to PDF, Word and other formats. - Document versioning as well as history should be able to be provided. - Authorship and control of authorship I also needed. - Fixed and editable status should also be provided - Document licensing should also be feature, this should allow management of information from different sources. Document licensing should allow copying and collation to be done in controlled way allowing information dissemination. By scraping the content of PDF documents that are put into the public domain or under open license that permits modification and that permission is given to subsume and render them into MediaWiki Pages in modifiable state. Those that are in the public domain or under a fixed content license may still be rendered into MediaWiki Pages. Quick access to the original document and modification history should be mandatory at the top of every original document page. Auto quotations and citations can also be generated at the bottom of the pages More to come ... Regards, Aaron Gray On 9 October 2016 at 13:37, Aaron Gray <aaronngray.lists(a)gmail.com> wrote: > Dear Wikipedia and MediaWiki people, > > Hers are some suggested ideas that may allow Wikipedia and MediaWiki to be > organized better in the future and for the future of organizing the worlds > open public information. > > *Summaries - popup summaries for pages* > > Using automated generation of content for the title attribute on the <a> > tag containing a summary containing either the content from an > <article><header><section id="summary"> or a designated section from > Wikimedia markdown a popup summary could be generated for quick browsing > for definition of terms on hyperlinks. This would vastly aid the user > experience. > > > *Categories - bread crumb like hierarchical and cross referencing > categorization and navigation* > By creating a set of categorical navigation pages the whole of fields of > knowledge on Wikipedia could be categorized. > > By having a set of clickable list of hierarchical categories displayed > like 'bread crumb' navigation lists under the page title the user could > quickly navigate this hierarchy. > > By adding pop up menus to the separating chevrons with each subcategories > elements cross category navigation would be made possible. > > Double clicking on chevrons should navigate to the categorical navigation > page. > > *QuickLink - Quick Link Creation* > > A hotkey and JavaScript script could allow the creation of links from a > selected highlighted bit of normal text to lookup a term, display its > summary and allow the user to confirm the generation of a new hyperlink > very quickly without having to edit markdown. > > *Move towards semantic content* > > By using new HTML elements like <article>, <section> and <header> and id's > and classes more of a semantic mapping of content may be established. Tis > maybe done incrementally and also for example by a bot auto generating new > summary information that maybe verified by either users or editors for > publishing. > > *API* > > API's from summaries, categories, and semantic content should be made > available. > > More to come ... > > Regards, > > Aaron Gray > >

1 0

CfP: 2017 International Conference on Social Media & Society (#SMSociety) - Toronto, Canada - July 28-30, 2017
by Anatoliy 06 Oct '16

06 Oct '16

Apologies for cross-postings ******************************** 2017 International Conference on Social Media & Society (#SMSociety) WHEN: July 28-30, 2017 WHERE: Toronto, Canada (Ted Rogers School of Management, Ryerson University) SUBMISSION DEADLINES: Dec 5, 2016: Workshops, Tutorials, & Panels Jan 16, 2017: Full & WIP Papers Mar 6, 2017: Poster Abstracts Conference website: <http://SocialMediaAndSociety.org> http://SocialMediaAndSociety.org 2017 #SMSociety Theme: Social Media for Social Good or Evil CALL FOR PROPOSALS Our online behaviour is far from virtual--it extends our offline lives. Much social media research has identified the positive opportunities of using social media; for example, how people use social media to form support groups online, participate in political uprising, raise money for charities, extend teaching and learning outside the classroom, etc. However, mirroring offline experiences, we have also seen social media being used to spread propaganda and misinformation, recruit terrorists, live stream criminal activities, reinforce echo chambers by politicians, and perpetuate hate and oppression (such as racist, sexist, homophobic, and anti-Semitic behaviour). Furthermore, behind the posts are algorithms, power structures, commercial interests and other factors that surreptitiously influence our experiences on social media. So, we ask: * What does it actually mean to use social media for social good? * How can social media be further leveraged for social justice? What are the threats to meaningful participation and how can we overcome these threats? * What do we know about the 4 W's of who, what, why, where (and how) do people engage in anti-social behaviour online? * What theoretical and methodological tools can we use to study anti-social behaviour? Can we detect such behaviour automatically? * What are the ethics of algorithms (inclusion, accessibility, data discrimination, bots)? * What are the legal, policy, privacy, and ethical implications of using social big data? * Considering the proliferation of bots online, can we still trust social media data? * And more broadly, what are the major effects of using social media on political, economic, individual, and social aspects of our society? The 2017 International Conference on Social Media & Society (#SMSociety) invites scholarly and original submissions that relate to the broad theme of Social Media & Society. We welcome both quantitative and qualitative work which crosses interdisciplinary boundaries and expands our understanding of the current and future trends in social media research, especially those that explore some of the questions and issues raised above. ABOUT THE CONFERENCE: The International Conference on Social Media & Society (#SMSociety) is an annual gathering of leading social media researchers from around the world. Now, in its 8th year, the 2017 conference will be held in Toronto, Canada at Ted Rogers School of Management, Ryerson University on July 28-30. >From its inception, the Conference has focused on the best practices for studying the impact and implications of social media on society. Our invited industry and academic keynotes have highlighted the shifting questions and concerns for the social media research community. From introducing media multiplexity and networked individualism with Caroline Haythornthwaite and Barry Wellman in 2010 and 2011, to measuring influence with Gilad Lotan and Sharad Goel in 2012 and 2013, to defining social media research as a field with Keith Hampton in 2014, to identifying our commitments as social media researchers in policy making with Bill Dutton in 2015, to exploring the future of social media technologies with John Weigelt in 2015, to highlighting the challenges of social media data mining in the context of big data with Susan Halford and Helen Kennedy in 2016. Organized by the <http://socialmedialab.ca/> Social Media Lab at <http://www.ryerson.ca/tedrogersschool/> Ted Rogers School of Management at Ryerson University, the conference provides participants with opportunities to exchange ideas, present original research, learn about recent and ongoing studies, and network with peers. The conference's intensive three-day program features workshops, full papers, work-in-progress papers, panels, and posters. The wide-ranging topics in social media showcase research from scholars working in many fields including Communication, Computer Science, Education, Journalism, Information Science, Management, Political Science, Sociology, Social Work, etc. SUBMISSION DETAILS: See online at https://socialmediaandsociety.org/submit/ PUBLISHING OPPORTUNITIES: Full and WIP (short) papers presented at the Conference will be published in the conference proceedings by <http://dl.acm.org/citation.cfm?id=2930971&CFID=847001369&CFTOKEN=15617273&p reflayout=flat#prox> ACM International Conference Proceeding Series (ICPS) and will be available in the ACM Digital Library. All conference presenters will be invited to submit their work as a full paper to the special issue of the <http://sms.sagepub.com/> Social Media + Society journal (published by SAGE). TOPICS OF INTEREST: Social Media Impact on Society . Political Mobilization & Engagement . Extremism & Terrorism . Politics of Hate and Oppression . The Sharing/Attention Economy . Social Media & Health . Virality & Memes . Social Media & Social Justice . Social Media & Business (Marketing, PR, HR, Risk Management, etc.) . Social Media & Academia (Alternative Metrics, Learning Analytics, etc.) . Social Media & Public Administration . Social Media & the News Online/Offline Communities . Trust & Credibility in Social Media . Online Community Detection . Influential User Detection . Identity Social Media & Small Data . Case Studies of Online Communities Formed on Social Media . Case Studies of Offline Communities that Rely on Social Media . Sampling Issues . Value of Small Data Social Media & Big Data . Visualization of Social Media Data . Social Media Data Mining . Scalability Issues & Social Media Data . Social Media Analytics . Ethics of Big Data/Algorithms Theories & Methods . Qualitative & Quantitative Approaches . Opinion Mining & Sentiment Analysis . Social Network Analysis . Theoretical Models for Studying, Analysing and Understanding Social Media Social Media & Mobile . App-ification of Society . Privacy & Security Issues in the Mobile World . Apps for the Social Good . Networking Apps ORGANIZING COMMITTEE: Anatoliy Gruzd, Ryerson University, Canada - Conference Chair Jenna Jacobson, University of Toronto, Canada - Conference Chair Philip Mai, Ryerson University, Canada - Conference Chair K. Hazel Kwon, Arizona State University, USA - Poster Chair ADVISORY BOARD: William H. Dutton, Michigan State University, USA Zizi Papacharissi, University of Illinois at Chicago, USA Barry Wellman, INSNA Founder, The Netlab Network

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l October 2016