Wiki-research-l

wiki-research-l@lists.wikimedia.org

8 participants
2988 discussions

Breaking into new Data-Spaces (workshop - CFP)
by Aaron Halfaker 18 Nov '15

18 Nov '15

*TL;DR* We want to get researchers in a room to experiment with infrastructure for making open data science easier. We're focusing on three infrastructural strategies (1) improving metadata and indexing open online community datasets, (2) an online querying service that makes processing, joining, and extracting subsets of data easier and (3) defining a protocol for reporting research methods that will make studies easier to replicate/extend. *Title:* Breaking into new Data-Spaces: Infrastructure for Open Community Science *Date:* February 27, 2016 *Application deadline:* December 31, 2015 *Conference website:* http://cscw.acm.org/2016/program/workshops.php#WP-10 *Apply/info:* https://meta.wikimedia.org/wiki/Research:Breaking_into_new_Data-Spaces *Participants announced:* January 15, 2016 We encourage you to apply <https://wikimedia.qualtrics.com/SE/?SID=SV_2bCdc2BGBGAWwmx> to a CSCW 2016 <http://cscw.acm.org/2016/> workshop focused on advancing your ability to do work with datasets from online communities. We will experiment with documentation protocols and technologies that are designed to make the process of “breaking into” a new dataset more tractable for researchers studying open online communities. *Who can participate* Anyone who builds, manages, studies or is interested in studying open online communities can apply. Fill out our application form and tell us a bit about your relevant interests and experience. *Organizers* Aaron Halfaker, Jonathan Morgan, Yuvaraj Pandian - Wikimedia Foundation Elizabeth Thiry - Boundless Kristen Schuster, A.J. Million, Sean Goggins - University of Missouri William Rand - University of Maryland David Laniado - Eurecat *Abstract* Despite being easily accessible, open online community (OOC) data can be difficult to use effectively. In order to access and analyze large amounts of data, researchers must first become familiar with the meaning of data values. Then they must find a way to obtain and process the datasets to extract their desired vectors of behavior and content. This process is fraught with problems that are solved (through great difficulty) over and over again by each research team/lab that breaks into datasets for a new OOC. In this workshop, we'll experiment with documentation protocols and technologies that are designed to make the process of “breaking into” a new dataset more tractable for researchers studying open online communities. This workshop’s purpose is to bring together researchers to test these systems and discover problems and missed opportunities to support iteration. Participants will also be given the opportunity to use state-of-the-art documentation and technologies to break into a new collection of datasets. This workshop is the direct result of a call to action to build infrastructure for data sharing between researchers from past CSCW workshops and related conferences. For more information and to apply see: https://meta.wikimedia.org/wiki/Research:Breaking_into_new_Data-Spaces

1 0

November 2015 Research Showcase
by Sarah Rodlund 18 Nov '15

18 Nov '15

Hi Everyone, The next Research Showcase will be live-streamed this Wednesday, November 18, 2015 at 11:30 (PST). YouTube stream: http://www.youtube.com/watch?v=kXCI6whgdUA As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Archive>. We look forward to seeing you! Kind regards, Sarah R. Rodlund Project Coordinator-Engineering, Wikimedia Foundation srodlund(a)wikimedia.org This month: *Impact, Characteristics, and Detection of Wikipedia Hoaxes* By Srijan Kumar False information on Wikipedia raises concerns about its credibility. One way in which false information may be presented on Wikipedia is in the form of hoax articles, i.e. articles containing fabricated facts about nonexistent entities or events. In this talk, we study false information on Wikipedia by focusing on the hoax articles that have been created throughout its history. First, we assess the real-world impact of hoax articles by measuring how long they survive before being debunked, how many pageviews they receive, and how heavily they are referred to by documents on the Web. We find that, while most hoaxes are detected quickly and have little impact on Wikipedia, a small number of hoaxes survive long and are well cited across the Web. Second, we characterize the nature of successful hoaxes by comparing them to legitimate articles and to failed hoaxes that were discovered shortly after being created. We find characteristic differences in terms of article structure and content, embeddedness into the rest of Wikipedia, and features of the editor who created the hoax. Third, we successfully apply our findings to address a series of classification tasks, most notably to determine whether a given article is a hoax. And finally, we describe and evaluate a task involving humans distinguishing hoaxes from non-hoaxes. We find that humans are not particularly good at the task and that our automated classifier outperforms them by a big margin.

2 1

To: Research into Wikimedia content and communities
by James Heilman 18 Nov '15

18 Nov '15

Agree. A great step forwards for all of us who do outreach. Many thanks to everyone who made this happen :-) -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com As of July 2015 I am a board member of the Wikimedia Foundation My emails; however, do not represent the official position of the WMF

2 1

Re: [Wiki-research-l] [Analytics] R client for the new Pageviews API
by Oliver Keyes 17 Nov '15

17 Nov '15

Shall do! I'm already linking in the internal documentation :) On 17 November 2015 at 21:11, Madhumitha Viswanathan <mviswanathan(a)wikimedia.org> wrote: > Woot! Nice :) Would be cool to link to the API docs from your README too. > > On Tue, Nov 17, 2015 at 5:54 PM, Oliver Keyes <okeyes(a)wikimedia.org> wrote: >> >> Hey! >> >> As y'all may have seen, we have a new pageviews API, with much finer >> granularity and better recall than the existing data. Since I had >> advance notice of the release, I was able to put together an R client >> already - you can get it at https://github.com/Ironholds/pageviews if >> R is your language of choice, and it'll be up on CRAN shortly. >> >> Thanks, >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> Analytics(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > -- > --Madhu :) > > _______________________________________________ > Analytics mailing list > Analytics(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Count Logula Wikimedia Foundation

2 3

R client for the new Pageviews API
by Oliver Keyes 17 Nov '15

17 Nov '15

Hey! As y'all may have seen, we have a new pageviews API, with much finer granularity and better recall than the existing data. Since I had advance notice of the release, I was able to put together an R client already - you can get it at https://github.com/Ironholds/pageviews if R is your language of choice, and it'll be up on CRAN shortly. Thanks, -- Oliver Keyes Count Logula Wikimedia Foundation

1 0

Re: [Wiki-research-l] [Analytics] Does StackExchange have more monthly active users than Wikipedia?
by Jonathan Morgan 14 Nov '15

14 Nov '15

+research Fascinating. Thanks for sharing this, Nemo. And for setting those arrogant Stackers straight ;) For anyone else interested: Nemo was able to answer this question because StackExchange has a Quarry <http://quarry.wmflabs.org/>-like public query interface of their own. You should go play with it right now: http://data.stackexchange.com/ Jonathan On Fri, Nov 13, 2015 at 10:56 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com> wrote: > Some information at > https://meta.stackexchange.com/questions/269334/how-many-active-users-contr… > > TL;DR: not really, and definitely not StackOverflow alone (~14k). But > perhaps the whole StackExchange has more than the English Wikipedia alone. > > Nemo > > _______________________________________________ > Analytics mailing list > Analytics(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

3 2

abuse and online game communities
by Toby Negrin 13 Nov '15

13 Nov '15

Although this is probably less research-y than usual posts to this list but I thought the use of ML to analyze a community curated corpus of positive and negative online behavior was interesting: http://recode.net/2015/07/07/doing-something-about-the-impossible-problem-o… -Toby

3 2

Download of pageviews dataset
by Cristian Consonni 11 Nov '15

11 Nov '15

Hi all, I write this email on the public list hoping that the discussion could be of interest for more people. I am working with a student on scientific citation on Wikipedia and, very simply put, we would like to use the pageview dataset to have a rough measure of how many times a paper was viewed thanks to Wikipedia.[*] The full dataset is, as of now, ~ 4.7TB in size. I have two questions: * if we download this dataset this would entail, from a first estimation, ~ 30 days of continuous download (assuming an average download speed of ~ 2MB/s, which was what we measured over the download of a month of data (~ 64GB)). Here at my University (Trento, Italy) this kind of downloads have to be notified to the IT department. I was wondering if this would be a useful information for the WMF, too. * given the estimation above I was wondering if it is possible to obtain this data over FedEx Bandwith (cit. [1]). i.e. via shipping of a physical disk, I know that in some fields (e.g. neuroscience) this is the standard way to exchange big dataset (in the order of TBs). Thanks in advance for your help. Cristian [*] I know these are pageviews and not unique visitors, furthermore there is no guarantee that viewing a citation means anything. I am approaching to this data the same way "impressions" versus "clicktroughs" are treated in the online advertising world. [1] https://what-if.xkcd.com/31/

2 1

The Wikimedia Research Newsletter 5(10) is out
by masssly＠ymail.com 04 Nov '15

04 Nov '15

The October 2015 issue of the Wikimedia Research Newsletter is out: https://blog.wikimedia.org/2015/11/03/research-newsletter-october-2015/ https://meta.wikimedia.org/wiki/Research:Newsletter/2015/October In this issue: 1 Students value Wikipedia both for quick answers and for detailed explorations 2 Jesus, Napoleon, and Obama top the "Wikipedia social network" 3 "Exploration of Online Culture Through Network Analysis of Wikipedia" 4 Briefly 4.1 Vandalism detection research neglects smaller languages 4.2 Automatic quality assessment using the "collaboration network" *** 20 recent publications were covered or listed in this issue *** Thanks to Jonathan Morgan, Morten Warncke-Wang and Piotr Konieczny for contributing. Masssly, Tilman Bayer and Dario Taraborelli --- Wikimedia Research Newsletter https://meta.wikimedia.org/wiki/Research:Newsletter/ * Follow us on Twitter: @WikiResearch * Receive this newsletter by mail: https://lists.wikimedia.org/mailman/listinfo/research-newsletter * Subscribe to the RSS feed: http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/

1 0

Please help finish "Report a problem" section of CoC (+updates)
by Matthew Flaschen 29 Oct '15

29 Oct '15

We are now working on the "Report a problem" section of the draft Code of conduct: * Section: https://www.mediawiki.org/wiki/Code_of_Conduct/Draft#Report_a_problem * Talk: https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Finishing_the_.22… * Alternatively, you can provide anonymous feedback to conduct-discussion(a)wikimedia.org . This is the best time to make any necessary changes to this section (and explain why, in edit summaries and/or talk) and discuss it on the talk page. Your participation is also encouraged re the "Project administrators and maintainers have the right" line. See https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Rewording_proposal. Other updates: * The text of the intro, "Principles" and "Unacceptable behavior" sections has been frozen. Thanks to everyone who helped discuss and edit these sections. See https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Fine_tuning_the_n… for details. * The "Expected behavior" section has been moved, to https://www.mediawiki.org/wiki/Expected_behavior (a guideline) and (for one sentence) to the "Report a problem" section. See https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Move_.22Expected_… and https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#New_proposal.2C_w… * The text at the end of the "Unacceptable behavior" section has been rewritten: https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Move_.22Our_open_… . Thanks, Matt Flaschen

1 2

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l