Wiki-research-l

wiki-research-l@lists.wikimedia.org

8 participants
2988 discussions

Re: [Wiki-research-l] unique visitors
by Kevin Leduc 17 Mar '16

17 Mar '16

FYI there was an official announcement about dropping the comScore numbers here: https://meta.wikimedia.org/wiki/ComScore/Announcement

1 0

Re: [Wiki-research-l] [Wikimedia-l] Research showcase: Evolution of privacy loss in Wikipedia
by Dario Taraborelli 17 Mar '16

17 Mar '16

On Wed, Mar 16, 2016 at 7:53 PM, SarahSV <sarahsv.wiki(a)gmail.com> wrote: > Dario and Aaron, thanks for letting us know about this. Is the research > available in writing for people who don't want to sit through the video? > > Sarah > Sarah – yes, see http://cm.cecs.anu.edu.au/post/wikiprivacy/ On Wed, Mar 16, 2016 at 12:55 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org> > wrote: > > > Reminder, this showcase is starting in 5 minutes. See the stream here: > > https://www.youtube.com/watch?v=Xle0oOFCNnk > > > > Join us on Freenode at #wikimedia-research > > <http://webchat.freenode.net/?channels=wikimedia-research> to ask Andrei > > questions. > > > > -Aaron > > > > On Tue, Mar 15, 2016 at 12:53 PM, Dario Taraborelli < > > dtaraborelli(a)wikimedia.org> wrote: > > > > > This month, our research showcase > > > <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2016 > > > > hosts > > > Andrei Rizoiu (Australian National University) to talk about his work > > > <http://cm.cecs.anu.edu.au/post/wikiprivacy/> on *how private traits > of > > > Wikipedia editors can be exposed from public data* (such as edit > > > histories) using off-the-shelf machine learning techniques. (abstract > > below) > > > > > > If you're interested in learning what the combination of machine > learning > > > and public data mean for privacy and surveillance, come and join us > this > > *Wednesday > > > March 16* at *1pm Pacific Time*. > > > > > > The event will be recorded and publicly streamed > > > <https://www.youtube.com/watch?v=Xle0oOFCNnk>. As usual, we will be > > > hosting the conversation with the speaker and Q&A on the > > > #wikimedia-research channel on IRC. > > > > > > Looking forward to seeing you there, > > > > > > Dario > > > > > > > > > Evolution of Privacy Loss in WikipediaThe cumulative effect of > collective > > > online participation has an important and adverse impact on individual > > > privacy. As an online system evolves over time, new digital traces of > > > individual behavior may uncover previously hidden statistical links > > between > > > an individual’s past actions and her private traits. To quantify this > > > effect, we analyze the evolution of individual privacy loss by studying > > > the edit history of Wikipedia over 13 years, including more than > 117,523 > > > different users performing 188,805,088 edits. We trace each Wikipedia’s > > > contributor using apparently harmless features, such as the number of > > edits > > > performed on predefined broad categories in a given time period (e.g. > > > Mathematics, Culture or Nature). We show that even at this unspecific > > level > > > of behavior description, it is possible to use off-the-shelf machine > > > learning algorithms to uncover usually undisclosed personal traits, > such > > as > > > gender, religion or education. We provide empirical evidence that the > > > prediction accuracy for almost all private traits consistently improves > > > over time. Surprisingly, the prediction performance for users who > stopped > > > editing after a given time still improves. The activities performed by > > new > > > users seem to have contributed more to this effect than additional > > > activities from existing (but still active) users. Insights from this > > work > > > should help users, system designers, and policy makers understand and > > make > > > long-term design choices in online content creation systems. > > > > > > > > > *Dario Taraborelli *Head of Research, Wikimedia Foundation > > > wikimediafoundation.org • nitens.org • @readermeter > > > <http://twitter.com/readermeter> > > > > > > _______________________________________________ > > > Wiki-research-l mailing list > > > Wiki-research-l(a)lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > > > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > New messages to: Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > > > _______________________________________________ > Wikimedia-l mailing list, guidelines at: > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > New messages to: Wikimedia-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> -- *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter>

1 0

Research showcase: Evolution of privacy loss in Wikipedia
by Dario Taraborelli 16 Mar '16

16 Mar '16

This month, our research showcase <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2016> hosts Andrei Rizoiu (Australian National University) to talk about his work <http://cm.cecs.anu.edu.au/post/wikiprivacy/> on *how private traits of Wikipedia editors can be exposed from public data* (such as edit histories) using off-the-shelf machine learning techniques. (abstract below) If you're interested in learning what the combination of machine learning and public data mean for privacy and surveillance, come and join us this *Wednesday March 16* at *1pm Pacific Time*. The event will be recorded and publicly streamed <https://www.youtube.com/watch?v=Xle0oOFCNnk>. As usual, we will be hosting the conversation with the speaker and Q&A on the #wikimedia-research channel on IRC. Looking forward to seeing you there, Dario Evolution of Privacy Loss in WikipediaThe cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual’s past actions and her private traits. To quantify this effect, we analyze the evolution of individual privacy loss by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia’s contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits, such as gender, religion or education. We provide empirical evidence that the prediction accuracy for almost all private traits consistently improves over time. Surprisingly, the prediction performance for users who stopped editing after a given time still improves. The activities performed by new users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems. *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter>

2 1

Fwd: [Air-L] Extended Deadline - Final Call: Archives Unleashed 2.0 - Web Archive Hackathon
by Gheorghe Zugravu 16 Mar '16

16 Mar '16

Sorry for eventual x-posting. It might be relevant to this list. -------- Forwarded Message -------- Subject: [Air-L] Extended Deadline - Final Call: Archives Unleashed 2.0 - Web Archive Hackathon Date: Tue, 15 Mar 2016 11:26:50 -0400 From: Matthew Weber <matthew.weber(a)rutgers.edu> To: air-l(a)listserv.aoir.org ***Call for Participation*** ***Deadline Extended to March 21*** Archives Unleashed 2.0: Web Archive Datathon Library of Congress, Washington DC 14 – 15 June 2016 Travel grants available for US-based graduate students; additional funding will be available for international participants Applications due March 21 2016 http://www.archivesunleashed.com **This event is a follow-up to the Archives Unleashed datathon held in March at the University of Toronto Library. With generous funding from the National Science Foundation and the Social Science and Humanities Research Council (Canada), we’ve been able to extend the datathon program, and are excited to bring this program to the Library of Congress.** The World Wide Web has a profound impact on how we research and understand the past. The sheer amount of cultural information that is generated and, crucially, preserved every day in electronic form, presents exciting new opportunities for researchers. Much of this information is captured within web archives. Web archives often contain hundreds of billions of web pages, ranging from individual homepages and social media posts, to institutional websites. These archives offer tremendous potential for social scientists and humanists, and the questions research may pose stretch across a multitude of fields. Scholars broaching topics dating back to the mid-1990s will find their projects enhanced by web data. Moreover, scholars hoping to study the evolution of cultural and societal phenomena will find a treasure trove of data in web archives. In short, web archives offer the ability to reconstruct large-scale traces of the relatively recent past. While there has been considerable discussion about web archive tools and datasets, few forums or mechanisms for coordinated, mutually informing development efforts have been created. This hackathon presents an opportunity to collaboratively unleash our web collections, exploring cutting-edge research tools while fostering a broad-based consensus on future directions in web archive analysis. This hackathon will bring together a small group of 20-30 participants to collaboratively develop new open-source tools and approaches to hackathon, and to kick-off collaboratively inspired research projects. Researchers should be comfortable with command line interactions, and knowledge of a scripting language such as Python strongly desired. By bringing together a group of like-minded scholars and programmers, we hope to begin building unified analytic production effort and to continue coalescing this nascent research community. At this event, we hope to converge on a shared vision of future directions in the use of web archives for inquiry in the humanities and social sciences in order to build a community of practice around various web archive analytics platforms and tools. Thanks to the generous support of the National Science Foundation, the Social Sciences and Humanities Research Council of Canada, the University of Waterloo’s Department of History, the David R. Cheriton School of Computer Science and the University of Waterloo, and the School of Communication and Information at Rutgers University, we will cover all meals and refreshments for attendees. We are also providing sample datasets for people to work on during the hackathon, or they are happy to use their own. Included datasets are: • the .gov web archive covering the American government domain • Canadian Political Parties and Political Interest Groups collection Those interested in participating should send a 250-word expression of interest and a CV to Matthew Weber (matthew.weber(a)rutgers.edu) by March 21 2016 with “Archives Unleashed” in the subject line. This expression of interest should address the scholarly questions that you will be bringing to the hackathon, and what datasets you might be interested in either working with or bringing to the event. Applicants will be notified by March 30 2016. We have a limited number of travel grants available for graduate students; preference will be given to those who have not participated in the Archives Unleashed program in the past, although we welcome returning participants. These grants can cover up to $750 in expenses. If you are in an eligible position, please indicate in your statement of interest that you would like to be considered for the travel grant. A letter of support from your graduate supervisor will also strengthen your application. On behalf of the organizers, Matthew Weber (Rutgers University), Ian Milligan (University of Waterloo), Jimmy Lin (University of Waterloo) Matthew S. Weber matthew.weber(a)rutgers.edu Assistant Professor School of Communication and Information http://www.matthewsweber.com _______________________________________________ The Air-L(a)listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/

1 0

Re: [Wiki-research-l] Visualization of "What is Wikipedia about?" using information from Wikidata (Pine W)
by Peter Ekman 11 Mar '16

11 Mar '16

Pine: I'm sorry that I can't read the graphic at http://www.informationisbeautifulawards.com/showcase/608-what-is-wikipedia-… It just doesn't zoom enough to read the part entitled "How to read it". I also can't find the graphic on the company's website http://www.askmedia.fr/ Frankly, it looks pretty flashy but if the content is supposed to be "What is Wikipedia about?" they've failed by getting a partial population from Wikidata. A random sample is usually a lot better than a partial population for this type of thing. For example, see my graphic on "What's in Wikipedia?" at https://commons.wikimedia.org/wiki/File:Size_of_English_Wikipedia_(1000_vol… That's only for the English-language Wikipedia of course. Also see the discussion at https://en.wikipedia.org/wiki/User:Smallbones/1000_random_results Just two quick questions. How does Wikipedia's coverage of science compare to its coverage of biographies of living sportsmen? Which graphic gives you that information? Pete aka User:Smallbones

1 0

Visualization of "What is Wikipedia about?" using information from Wikidata
by Pine W 11 Mar '16

11 Mar '16

Beautiful visualization, for those who enjoy such things: http://www.informationisbeautifulawards.com/showcase/608-what-is-wikipedia-… Pine

2 1

Fwd: [Wikitech-l] 6-April-2016 CREDIT call for demos
by Pine W 11 Mar '16

11 Mar '16

Forwarding announcement. Pine ---------- Forwarded message ---------- From: Adam Baso <abaso(a)wikimedia.org> Date: Thu, Mar 10, 2016 at 8:09 AM Subject: [Wikitech-l] 6-April-2016 CREDIT call for demos To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Hi all, As I noted recently, we're merging the WMF technical showcase with the lightning talks <https://www.mediawiki.org/wiki/Lightning_Talks> into what we're calling *CREDIT <https://www.mediawiki.org/wiki/CREDIT_showcase>.* The next CREDIT showcase will be 6-April-2016 at 1800 UTC (1100 SF). Please add your demos to https://etherpad.wikimedia.org/p/CREDIT Thanks! -Adam _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

1 0

The Wikimedia Research Newsletter 6(2) is out
by Mohammed Sadat 09 Mar '16

09 Mar '16

The February 2016 issue of the Wikimedia Research Newsletter is out: https://blog.wikimedia.org/2016/03/08/research-newsletter-february-2016/ https://meta.wikimedia.org/wiki/Research:Newsletter/2016/February In this issue: 1 "Monetary materialities of peer-produced knowledge: the case of Wikipedia and Its Tensions with Paid Labour" 2 "The Swedish Wikipedia gender gap" 3 Test of 300k citations: how verifiable is "verifiable" in practice? 4 Twelve years of Wikipedia research 5 Further criticism of study that had criticized accuracy of medical Wikipedia articles 6 Briefly 6.1 The attention economy of Wikipedia articles on news topics 6.2 A Swiss perspective on Wikipedia and academia *** 18 recent publications were covered or listed in this issue *** Thanks to Nicolas Jullien and Piotr Konieczny for contributing. Masssly, Tilman Bayer and Dario Taraborelli --- Wikimedia Research Newsletter https://meta.wikimedia.org/wiki/Research:Newsletter/ * Follow us on Twitter: @WikiResearch * Receive this newsletter by mail: https://lists.wikimedia.org/mailman/listinfo/research-newsletter * Subscribe to the RSS feed: http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/

1 0

Call for Participation in WebSci'16 Hackathon [Exploring the Past of the Web: Alexandria & Archive-It Hackathon]
by Ujwal Gadiraju 08 Mar '16

08 Mar '16

Hello, ### APOLOGIES FOR CROSS-POSTING ### ------------------------------------------------------------------ Exploring the Past of the Web: Alexandria & Archive-It Hackathon ------------------------------------------------------------------ Web Science 2016 Hackathon: http://www.websci16.org/hackathon ----------------- Hackathon Chairs ----------------- Avishek Anand, L3S Research Center, Germany Jefferson Bailey, Internet Archive, USA The Web has pervaded all walks of life and has become an important corpus for studying the humanities, social sciences, and for use by computer scientists and other disciplines. Web archives collect, preserve, and provide ongoing access to ephemeral Web pages and hence encode traces of human thought, activity, and history. This makes them a valuable resource for analysis and study. However, there have been only few concerted efforts to bring together tools, platforms, storage, processing frameworks, and existing collections for mining and analysing Web archives. We present the Alexandria & Archive-It Hackathon @ WebSci’16 as a forum for scientists, engineers, practitioners, and enthusiasts to work with Web archive collections at scale and use and help build tools that can help realize the largely untapped potential of using Web archives in their research and work. The goal of the Hackathon is to bring together a small and focused group of participants to collaboratively work with Web archive collections using open-source tools and platforms and to discuss new ideas in exploring and analyzing these collections. We will provide access to focused, subject-specific Web archive collections from a diverse set of institutions and topics. The data consists of collections from Archive-It, Internet Archive’s web archiving service, and is housed on a commercial data cluster (provided generously by www.altiscale.com) for processing and analysis, but can be browsed on the Web as well through their collection pages athttps://archive-it.org/. The topics range from web pages collected around events (like the U.S. Occupy Movement), interest groups (politics, art, et cetera), home pages (museums, universities) and more. All collections were archived over a notable period of time and can support multiple analytical approaches and tools. A range of collections will be available for use in the hackathon. Some examples of the types of collections to be included: [1]. Human Rights web archive collected by Columbia University: https://www.archive-it.org/collections/1068 [2]. Occupy Movement 2011/2012, collected by Internet Archive: https://archive-it.org/collections/2950 [3]. Auction Houses web archive, collected by New York Art Resources consortium: https://www.archive-it.org/collections/2135 [4]. Contemporary Women Artists on the Web, collected by National Museum of Women in the Arts:https://archive-it.org/collections/2973 To lower the entry barrier in accessing and analysing this data we will provide a small hands-on session on Day 1, using existing open source tools, and will be able to provide some coaching during the Hackathon to groups not yet fully fluent with working with large data clusters. We want to ensure that participation will be truly cross-disciplinary with the hope of fostering cross-fertilization of ideas from users and researchers from multiple disciplines, including social and political sciences, the humanities, and computer science. We will end the Hackathon on Day 4 with presentations of team accomplishments as well as discussions and exchange of ideas for future projects and collaborations. The Hackathon will run in parallel to the WebSci’16 conference, to allow participants to register and attend the conference, and will finish one day after the conference. Participants will receive promotional materials from the event hosts and Internet Archive and Archive-It. The research team with the most accomplished plan, project, or future work will receive a complimentary Archive-It account that can be used to build their own web archive collection for use in their own future research. Alexandria and Archive-It also plan on convening additional hackathons and web archive data mining challenges in conjunction with future conferences and events. ------------- Registration ------------- The registration for the Hackathon is free for WebSci'16 participants, however we waive off the charges for participating only in the Hackathon. If you want to register for "Hackathon Only": People who only want to attend the hackathon, can register on http://websci16.org/registration by selecting "Dinner only" first and on the next page below their personal details select "Hackathon only". Feel free to contact us if you have any questions at: websci-hackathon(a)l3s.de. Best Regards, Ujwal Gadiraju -- Ujwal Gadiraju L3S Research Center Leibniz Universität Hannover 30167 Hannover, Germany Phone: +49. 511. 762-5772 Fax: +49. 511. 762-19712 E-Mail: gadiraju(a)l3s.de Web: www.l3s.de/~gadiraju/

1 0

CSCW 2016 -- My highlights
by Aaron Halfaker 08 Mar '16

08 Mar '16

Hey folks, I just gathered my notes about CSCW 2016 <https://cscw.acm.org/2016/index.php> which happened last week in San Francisco. I figured that some here might like to read a bit about what I saw there. In the report, I discuss a workshop that we organized and 7 papers/presentations that I thought were interesting. <goog_1722106129> https://meta.wikimedia.org/wiki/User:EpochFail/CSCW_2016_report Enjoy! -Aaron

2 1

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l