Requests with "Special:Random" / total number of requests for each project.
On Sat, Mar 19, 2016 at 11:08 PM, <
wiki-research-l-request(a)lists.wikimedia.org> wrote:
> Send Wiki-research-l mailing list submissions to
> wiki-research-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
> 1. Re: Wiki-research-l Digest, Vol 127, Issue 15 (Felix J. Scholz)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 19 Mar 2016 18:08:11 -0400
> From: "Felix J. Scholz" <felixjacobscholz(a)gmail.com>
> To: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] Wiki-research-l Digest, Vol 127, Issue
> 15
> Message-ID:
> <
> CADBcukYu1kDZLLvTOKJRQAo6SO017mXMnceWYxPDUJmnm--sLA(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Most likely, it is the absolute number of random searches in billions
> On Mar 19, 2016 6:01 PM, "Andrew Gray" <andrew.gray(a)dunelm.org.uk> wrote:
>
> > Hi Alex,
> >
> > Stupid question - is that "3.36% of all random-article searches were on
> > Latvian WP", or "3.36% of all searches (pageviews?) on the Latvian WP
> were
> > through random-article"?
> >
> > Andrew.
> >
> > On 19 March 2016 at 21:34, Alex Druk <alex.druk(a)gmail.com> wrote:
> >
> >> Mark J. Nelson writes:
> >> >Specifically, a small slice of content, mainly English Wikipedia
> >> >articles on pop culture, recent news events, and U.S. politics,
> >> >contribute a disproportionate share of views. (A weekly top-25 list for
> >> >enwiki is at https://en.wikipedia.org/wiki/Wikipedia:Top_25_Report ).
> So
> >> >if you're measuring aggregate numbers, you're measuring mainly that
> >> >specific type of content. If the goal is really simply to reach as many
> >> >people as possible, have high page views and unique visitor counts,
> >> >etc., then this subset of articles is really the only important part of
> >> >Wikimedia's mission--- articles on, say, mathematics, don't contribute
> >> >anything to moving the needle if that's the metric.
> >>
> >> One should also consider the fact that significant number of users use
> >> Wikipedia as entertainment. As an example of such use is random
> searches. On
> >> all Wikipedia sites number of random searches in 2014 exceeded 1
> billion.
> >> Here is a simple graph illustrated this:
> >> [image: Inline image 1]
> >>
> >> ~~Alex
> >>
> >>
> >> On Sat, Mar 19, 2016 at 1:00 PM, <
> >> wiki-research-l-request(a)lists.wikimedia.org> wrote:
> >>
> >>> Send Wiki-research-l mailing list submissions to
> >>> wiki-research-l(a)lists.wikimedia.org
> >>>
> >>> To subscribe or unsubscribe via the World Wide Web, visit
> >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >>> or, via email, send a message with subject or body 'help' to
> >>> wiki-research-l-request(a)lists.wikimedia.org
> >>>
> >>> You can reach the person managing the list at
> >>> wiki-research-l-owner(a)lists.wikimedia.org
> >>>
> >>> When replying, please edit your Subject line so it is more specific
> >>> than "Re: Contents of Wiki-research-l digest..."
> >>>
> >>>
> >>> Today's Topics:
> >>>
> >>> 1. Re: unique visitors (Mark J. Nelson)
> >>>
> >>>
> >>> ----------------------------------------------------------------------
> >>>
> >>> Message: 1
> >>> Date: Fri, 18 Mar 2016 20:57:45 +0100
> >>> From: Mark J. Nelson <mjn(a)anadrome.org>
> >>> To: Research into Wikimedia content and communities
> >>> <wiki-research-l(a)lists.wikimedia.org>
> >>> Subject: Re: [Wiki-research-l] unique visitors
> >>> Message-ID: <87zitvzhhi.fsf(a)mjn.anadrome.org>
> >>> Content-Type: text/plain
> >>>
> >>> phoebe ayers <phoebe.wiki(a)gmail.com> writes:
> >>>
> >>> > I wonder if there's a qualitative project somewhere in here about
> >>> > *types* of use -- e.g. if I'm using WP on my phone & my work pc is
> >>> > that really equivalent use? Perhaps I am using them for different
> >>> > kinds of information seeking, e.g. looking up terms related to work
> vs
> >>> > looking up info on movie stars -- does this different kind of use
> >>> > matter for how we construct and present information, or count "use"?
> >>>
> >>> Beyond the issue of devices, I think this is important in part because
> >>> the raw traffic counts (and reach numbers and similar) paint a very
> >>> specific story of what Wikimedia is doing and is successful at. (And
> >>> what you measure influences what you tend to optimize for.)
> >>>
> >>> Specifically, a small slice of content, mainly English Wikipedia
> >>> articles on pop culture, recent news events, and U.S. politics,
> >>> contribute a disproportionate share of views. (A weekly top-25 list for
> >>> enwiki is at https://en.wikipedia.org/wiki/Wikipedia:Top_25_Report ).
> So
> >>> if you're measuring aggregate numbers, you're measuring mainly that
> >>> specific type of content. If the goal is really simply to reach as many
> >>> people as possible, have high page views and unique visitor counts,
> >>> etc., then this subset of articles is really the only important part of
> >>> Wikimedia's mission--- articles on, say, mathematics, don't contribute
> >>> anything to moving the needle if that's the metric.
> >>>
> >>> -Mark
> >>>
> >>>
> >>>
> >>> ------------------------------
> >>>
> >>> Subject: Digest Footer
> >>>
> >>> _______________________________________________
> >>> Wiki-research-l mailing list
> >>> Wiki-research-l(a)lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >>>
> >>>
> >>> ------------------------------
> >>>
> >>> End of Wiki-research-l Digest, Vol 127, Issue 15
> >>> ************************************************
> >>>
> >>
> >>
> >>
> >> --
> >> Thank you.
> >>
> >> Alex Druk
> >> alex.druk(a)gmail.com
> >> (775) 237-8550 Google voice
> >>
> >> _______________________________________________
> >> Wiki-research-l mailing list
> >> Wiki-research-l(a)lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >>
> >>
> >
> >
> > --
> > - Andrew Gray
> > andrew.gray(a)dunelm.org.uk
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > Wiki-research-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> >
>
Mark J. Nelson writes:
>Specifically, a small slice of content, mainly English Wikipedia
>articles on pop culture, recent news events, and U.S. politics,
>contribute a disproportionate share of views. (A weekly top-25 list for
>enwiki is at https://en.wikipedia.org/wiki/Wikipedia:Top_25_Report ). So
>if you're measuring aggregate numbers, you're measuring mainly that
>specific type of content. If the goal is really simply to reach as many
>people as possible, have high page views and unique visitor counts,
>etc., then this subset of articles is really the only important part of
>Wikimedia's mission--- articles on, say, mathematics, don't contribute
>anything to moving the needle if that's the metric.
One should also consider the fact that significant number of users use
Wikipedia as entertainment. As an example of such use is random searches. On
all Wikipedia sites number of random searches in 2014 exceeded 1 billion.
Here is a simple graph illustrated this:
[image: Inline image 1]
~~Alex
On Sat, Mar 19, 2016 at 1:00 PM, <
wiki-research-l-request(a)lists.wikimedia.org> wrote:
> Send Wiki-research-l mailing list submissions to
> wiki-research-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
> 1. Re: unique visitors (Mark J. Nelson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 18 Mar 2016 20:57:45 +0100
> From: Mark J. Nelson <mjn(a)anadrome.org>
> To: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] unique visitors
> Message-ID: <87zitvzhhi.fsf(a)mjn.anadrome.org>
> Content-Type: text/plain
>
> phoebe ayers <phoebe.wiki(a)gmail.com> writes:
>
> > I wonder if there's a qualitative project somewhere in here about
> > *types* of use -- e.g. if I'm using WP on my phone & my work pc is
> > that really equivalent use? Perhaps I am using them for different
> > kinds of information seeking, e.g. looking up terms related to work vs
> > looking up info on movie stars -- does this different kind of use
> > matter for how we construct and present information, or count "use"?
>
> Beyond the issue of devices, I think this is important in part because
> the raw traffic counts (and reach numbers and similar) paint a very
> specific story of what Wikimedia is doing and is successful at. (And
> what you measure influences what you tend to optimize for.)
>
> Specifically, a small slice of content, mainly English Wikipedia
> articles on pop culture, recent news events, and U.S. politics,
> contribute a disproportionate share of views. (A weekly top-25 list for
> enwiki is at https://en.wikipedia.org/wiki/Wikipedia:Top_25_Report ). So
> if you're measuring aggregate numbers, you're measuring mainly that
> specific type of content. If the goal is really simply to reach as many
> people as possible, have high page views and unique visitor counts,
> etc., then this subset of articles is really the only important part of
> Wikimedia's mission--- articles on, say, mathematics, don't contribute
> anything to moving the needle if that's the metric.
>
> -Mark
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 127, Issue 15
> ************************************************
>
--
Thank you.
Alex Druk
alex.druk(a)gmail.com
(775) 237-8550 Google voice
We're thrilled to announce the list of papers accepted at the WWW 2016 Wiki
Workshop <http://snap.stanford.edu/wikiworkshop2016/>. You can follow
@wikiworkshop16 <https://twitter.com/wikiworkshop16> for updates.
Dario
(on behalf of the organizers)
Johanna Geiß and Michael Gertz
With a Little Help from my Neighbors: Person Name Linking Using the
Wikipedia Social Network
Ramine Tinati, Markus Luczak-Roesch and Wendy Hall
Finding Structure in Wikipedia Edit Activity: An Information Cascade
Approach
Paolo Boldi and Corrado Monti
Cleansing Wikipedia Categories using Centrality
Thomas Steiner
Wikipedia Tools for Google Spreadsheets
Yu Suzuki and Satoshi Nakamura
Assessing the Quality of Wikipedia Editors through Crowdsourcing
Vikrant Yadav and Sandeep Kumar
Learning Web Queries For Retrieval of Relevant Information About an Entity
in a Wikipedia Category
Haggai Roitman, Shay Hummel, Ella Rabinovich, Benjamine Sznajder, Noam
Slonim and Ehud Aharoni
On the Retrieval of Wikipedia Articles Containing Claims on Controversial
Topics
Tanushyam Chattopadhyay, Santa Maiti and Arindam Pal
Automatic Discovery of Emerging Trends using Cluster Name Synthesis on User
Consumption Data
Freddy Brasileiro, João Paulo A. Almeida, Victorio A. Carvalho and
Giancarlo Guizzardi
Applying a Multi-Level Modeling Theory to Assess Taxonomic Hierarchies in
Wikidata
*Dario Taraborelli *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
Hi all,
Can someone help me with my failing memory and remind me what the
current state of affairs is re: unique visitors -- we're not counting
them anymore? We are counting them but not via comscore? Something
else?
Just putting together a talk and wanted the latest numbers.
thanks,
-- Phoebe
--
* I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
On Wed, Mar 16, 2016 at 7:53 PM, SarahSV <sarahsv.wiki(a)gmail.com> wrote:
> Dario and Aaron, thanks for letting us know about this. Is the research
> available in writing for people who don't want to sit through the video?
>
> Sarah
>
Sarah – yes, see http://cm.cecs.anu.edu.au/post/wikiprivacy/
On Wed, Mar 16, 2016 at 12:55 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org>
> wrote:
>
> > Reminder, this showcase is starting in 5 minutes. See the stream here:
> > https://www.youtube.com/watch?v=Xle0oOFCNnk
> >
> > Join us on Freenode at #wikimedia-research
> > <http://webchat.freenode.net/?channels=wikimedia-research> to ask Andrei
> > questions.
> >
> > -Aaron
> >
> > On Tue, Mar 15, 2016 at 12:53 PM, Dario Taraborelli <
> > dtaraborelli(a)wikimedia.org> wrote:
> >
> > > This month, our research showcase
> > > <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2016
> >
> > hosts
> > > Andrei Rizoiu (Australian National University) to talk about his work
> > > <http://cm.cecs.anu.edu.au/post/wikiprivacy/> on *how private traits
> of
> > > Wikipedia editors can be exposed from public data* (such as edit
> > > histories) using off-the-shelf machine learning techniques. (abstract
> > below)
> > >
> > > If you're interested in learning what the combination of machine
> learning
> > > and public data mean for privacy and surveillance, come and join us
> this
> > *Wednesday
> > > March 16* at *1pm Pacific Time*.
> > >
> > > The event will be recorded and publicly streamed
> > > <https://www.youtube.com/watch?v=Xle0oOFCNnk>. As usual, we will be
> > > hosting the conversation with the speaker and Q&A on the
> > > #wikimedia-research channel on IRC.
> > >
> > > Looking forward to seeing you there,
> > >
> > > Dario
> > >
> > >
> > > Evolution of Privacy Loss in WikipediaThe cumulative effect of
> collective
> > > online participation has an important and adverse impact on individual
> > > privacy. As an online system evolves over time, new digital traces of
> > > individual behavior may uncover previously hidden statistical links
> > between
> > > an individual’s past actions and her private traits. To quantify this
> > > effect, we analyze the evolution of individual privacy loss by studying
> > > the edit history of Wikipedia over 13 years, including more than
> 117,523
> > > different users performing 188,805,088 edits. We trace each Wikipedia’s
> > > contributor using apparently harmless features, such as the number of
> > edits
> > > performed on predefined broad categories in a given time period (e.g.
> > > Mathematics, Culture or Nature). We show that even at this unspecific
> > level
> > > of behavior description, it is possible to use off-the-shelf machine
> > > learning algorithms to uncover usually undisclosed personal traits,
> such
> > as
> > > gender, religion or education. We provide empirical evidence that the
> > > prediction accuracy for almost all private traits consistently improves
> > > over time. Surprisingly, the prediction performance for users who
> stopped
> > > editing after a given time still improves. The activities performed by
> > new
> > > users seem to have contributed more to this effect than additional
> > > activities from existing (but still active) users. Insights from this
> > work
> > > should help users, system designers, and policy makers understand and
> > make
> > > long-term design choices in online content creation systems.
> > >
> > >
> > > *Dario Taraborelli *Head of Research, Wikimedia Foundation
> > > wikimediafoundation.org • nitens.org • @readermeter
> > > <http://twitter.com/readermeter>
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > New messages to: Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> >
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
--
*Dario Taraborelli *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
This month, our research showcase
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2016> hosts
Andrei Rizoiu (Australian National University) to talk about his work
<http://cm.cecs.anu.edu.au/post/wikiprivacy/> on *how private traits of
Wikipedia editors can be exposed from public data* (such as edit histories)
using off-the-shelf machine learning techniques. (abstract below)
If you're interested in learning what the combination of machine learning
and public data mean for privacy and surveillance, come and join us
this *Wednesday
March 16* at *1pm Pacific Time*.
The event will be recorded and publicly streamed
<https://www.youtube.com/watch?v=Xle0oOFCNnk>. As usual, we will be hosting
the conversation with the speaker and Q&A on the #wikimedia-research
channel on IRC.
Looking forward to seeing you there,
Dario
Evolution of Privacy Loss in WikipediaThe cumulative effect of collective
online participation has an important and adverse impact on individual
privacy. As an online system evolves over time, new digital traces of
individual behavior may uncover previously hidden statistical links between
an individual’s past actions and her private traits. To quantify this
effect, we analyze the evolution of individual privacy loss by studying the
edit history of Wikipedia over 13 years, including more than 117,523
different users performing 188,805,088 edits. We trace each Wikipedia’s
contributor using apparently harmless features, such as the number of edits
performed on predefined broad categories in a given time period (e.g.
Mathematics, Culture or Nature). We show that even at this unspecific level
of behavior description, it is possible to use off-the-shelf machine
learning algorithms to uncover usually undisclosed personal traits, such as
gender, religion or education. We provide empirical evidence that the
prediction accuracy for almost all private traits consistently improves
over time. Surprisingly, the prediction performance for users who stopped
editing after a given time still improves. The activities performed by new
users seem to have contributed more to this effect than additional
activities from existing (but still active) users. Insights from this work
should help users, system designers, and policy makers understand and make
long-term design choices in online content creation systems.
*Dario Taraborelli *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
Sorry for eventual x-posting.
It might be relevant to this list.
-------- Forwarded Message --------
Subject: [Air-L] Extended Deadline - Final Call: Archives Unleashed 2.0
- Web Archive Hackathon
Date: Tue, 15 Mar 2016 11:26:50 -0400
From: Matthew Weber <matthew.weber(a)rutgers.edu>
To: air-l(a)listserv.aoir.org
***Call for Participation*** ***Deadline Extended to March 21***
Archives Unleashed 2.0: Web Archive Datathon
Library of Congress, Washington DC
14 – 15 June 2016
Travel grants available for US-based graduate students; additional funding will be available for international participants
Applications due March 21 2016
http://www.archivesunleashed.com
**This event is a follow-up to the Archives Unleashed datathon held in March at the University of Toronto Library. With generous funding from the National Science Foundation and the Social Science and Humanities Research Council (Canada), we’ve been able to extend the datathon program, and are excited to bring this program to the Library of Congress.**
The World Wide Web has a profound impact on how we research and understand the past. The sheer amount of cultural information that is generated and, crucially, preserved every day in electronic form, presents exciting new opportunities for researchers. Much of this information is captured within web archives.
Web archives often contain hundreds of billions of web pages, ranging from individual homepages and social media posts, to institutional websites. These archives offer tremendous potential for social scientists and humanists, and the questions research may pose stretch across a multitude of fields. Scholars broaching topics dating back to the mid-1990s will find their projects enhanced by web data. Moreover, scholars hoping to study the evolution of cultural and societal phenomena will find a treasure trove of data in web archives. In short, web archives offer the ability to reconstruct large-scale traces of the relatively recent past.
While there has been considerable discussion about web archive tools and datasets, few forums or mechanisms for coordinated, mutually informing development efforts have been created. This hackathon presents an opportunity to collaboratively unleash our web collections, exploring cutting-edge research tools while fostering a broad-based consensus on future directions in web archive analysis.
This hackathon will bring together a small group of 20-30 participants to collaboratively develop new open-source tools and approaches to hackathon, and to kick-off collaboratively inspired research projects. Researchers should be comfortable with command line interactions, and knowledge of a scripting language such as Python strongly desired. By bringing together a group of like-minded scholars and programmers, we hope to begin building unified analytic production effort and to continue coalescing this nascent research community.
At this event, we hope to converge on a shared vision of future directions in the use of web archives for inquiry in the humanities and social sciences in order to build a community of practice around various web archive analytics platforms and tools.
Thanks to the generous support of the National Science Foundation, the Social Sciences and Humanities Research Council of Canada, the University of Waterloo’s Department of History, the David R. Cheriton School of Computer Science and the University of Waterloo, and the School of Communication and Information at Rutgers University, we will cover all meals and refreshments for attendees. We are also providing sample datasets for people to work on during the hackathon, or they are happy to use their own. Included datasets are:
• the .gov web archive covering the American government domain
• Canadian Political Parties and Political Interest Groups collection
Those interested in participating should send a 250-word expression of interest and a CV to Matthew Weber (matthew.weber(a)rutgers.edu) by March 21 2016 with “Archives Unleashed” in the subject line. This expression of interest should address the scholarly questions that you will be bringing to the hackathon, and what datasets you might be interested in either working with or bringing to the event. Applicants will be notified by March 30 2016.
We have a limited number of travel grants available for graduate students; preference will be given to those who have not participated in the Archives Unleashed program in the past, although we welcome returning participants. These grants can cover up to $750 in expenses. If you are in an eligible position, please indicate in your statement of interest that you would like to be considered for the travel grant. A letter of support from your graduate supervisor will also strengthen your application.
On behalf of the organizers,
Matthew Weber (Rutgers University), Ian Milligan (University of Waterloo), Jimmy Lin (University of Waterloo)
Matthew S. Weber
matthew.weber(a)rutgers.edu
Assistant Professor
School of Communication and Information
http://www.matthewsweber.com
_______________________________________________
The Air-L(a)listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers:
http://www.aoir.org/