Wiki-research-l May 2015

wiki-research-l@lists.wikimedia.org

26 participants
15 discussions

Wikipedia Research policy
by song＠cs.umn.edu 14 Jul '23

14 Jul '23

Pursuant to prior discussions about the need for a research policy on Wikipedia, WikiProject Research is drafting a policy regarding the recruitment of Wikipedia users to participate in studies. At this time, we have a proposed policy, and an accompanying group that would facilitate recruitment of subjects in much the same way that the Bot Approvals Group approves bots. The policy proposal can be found at: http://en.wikipedia.org/wiki/Wikipedia:Research The Subject Recruitment Approvals Group mentioned in the proposal is being described at: http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group Before we move forward with seeking approval from the Wikipedia community, we would like additional input about the proposal, and would welcome additional help improving it. Also, please consider participating in WikiProject Research at: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research -- Bryan Song GroupLens Research University of Minnesota

8 10

Wikipedia aggregate clickstream data released
by Dario Taraborelli 17 Jan '18

17 Jan '18

We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770> This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015. This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream> Ellery and Dario

5 5

CFP for BigData2015: International Conference in Mauritius
by Grace Allas 22 May '15

22 May '15

*** CALL FOR PAPERS *** http://sdiwc.net/conferences/bigdata2015/ The Second International Conference on Data Mining, Internet Computing, and Big Data (BigData2015) University of Mauritius, Le Reduit, Moka, Mauritius June 29 – July 01, 2015 All registered papers will be included in SDIWC Digital Library. ============================================================== The conference aims to enable researchers build connections between different digital applications. The event will be held over three days, with presentations delivered by researchers from the international community, including presentations from keynote speakers and state-of-the-art lectures. RESEARCH TOPICS ARE NOT LIMITED TO: * Data Mining Tasks & Algorithms Explorative and visual data mining Mining text and semi-structured data Multimedia mining (audio/video) Segmentation/Clustering/Association Web mining Artificial neural networks Link and sequence analysis Evolutionary computation/meta heuristics * Data Mining Integration & Process Distributed and grid based data mining Metadata and ontologies Mining large scale data Attribute discretization and encoding Feature selection and transformation Model interpretation Data cleaning and preparation * Data Mining Applications Bioinformatics Business/Corporate/Industrial Data Mining Credit Scoring Data Mining in Logistics Database Marketing Direct Marketing Engineering Mining Medicine Data Mining Military Data Mining Security Data Mining Social Science Mining Time series analysis and visualization Anomaly detection Association rule learning Classification Cloud based infrastructure (applications, storage and resources) Cluster analysis Crowd-sourcing Data fusion and integration Data-mining grids Distributed databases Distributed file systems Ensemble learning Genetic algorithms Machine learning Massively parallel-processing (MPP) databases Natural language processing Neural networks Pattern recognition Predictive modelling * Internet Computing Design and analysis of internet protocols and engineering Digital libraries/digital image collections Electronic commerce and internet Grid based computing and internet tools Internet and emerging technologies Internet and video technologies Internet applications and appliances Internet banking systems Internet based decision support systems Internet law and compliance Internet security and trust Markup Languages Metacomputing Mobile computing and the internet Network architectures and network computing Novel Java applications on internet Quality of service Search engines Social networks The WWW and intranets The internet and Cloud computing Web based computing Web interfaces to databases Web site design and coordination Search-based applications Sentiment analysis Signal processing Simulation Supervised and unsupervised learning ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORTANT DATES: Submission Deadline : May 29, 2015 Notification of Acceptance : 2 - 4 weeks from the submission date Camera Ready Submission : Open from now until June 09, 2015 Registration Date : Open from now until June 09, 2015 Conference Dates : June 29 - July 01, 2015 Researchers are encouraged to submit their work electronically. All papers will be fully refereed by a minimum of two specialized referees. Before final acceptance, all referees comments must be considered. Paper Submission: hhttp:// sdiwc.net/conferences/bigdata2015/paper-submission/ Write us for more details: bigdata15(a)sdiwc.net ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1 0

Upcoming research newsletter (May 2015): new papers open for review
by masssly＠ymail.com 15 May '15

15 May '15

Hi everybody, we’re preparing for the May 2015 research newsletter and looking for contributors. Please take a look at: https://etherpad.wikimedia.org/p/WRN201505 and add your name next to any paper you are interested in covering. As usual, short notes and one-paragraph reviews are most welcome. Highlights from this month: Wikidata through the Eyes of DBpedia Predicting elections from online information flows: towards theoretically informed models Understanding Graph Structure of Wikipedia for Query Expansion A New Epistemic Culture Wikipedia as an Arena for the Production of Knowledge in Late Modernity The EU Public Interest Clinic and Wikimedia Present: Extending Freedom of Panorama in Europe Utilizing the Wikidata System to Improve the Quality of Medical Content in Wikipedia in Diverse Languages: A Pilot Study Eliciting Disease Data from Wikipedia Articles Centre Stage: How Social Network Position Shapes Linguistic Coordination Synthesizing knowledge from disagreement Aligning Sentences from Standard Wikipedia to Simple Wikipedia Debating reliable sources: writing the history of the Vietnam War on Wikipedia Turning Introductory Comparative Politics and Elections Courses into Social Science Research Communities Using Wikipedia: Improving Both Teaching and Research If you have any question about the format or process feel free to get in touch off-list. Masssly, Tilman Bayer and Dario Taraborelli [1] http://meta.wikimedia.org/wiki/Research:Newsletter

2 1

May 2015 research showcase
by Leila Zia 13 May '15

13 May '15

Hi everyone, The next research showcase will be live-streamed this Wednesday, May 13 at 11.30 PT. The streaming link will be posted on the lists a few minutes before the showcase starts and as usual, you can join the conversation on IRC at #wikimedia-research. We look forward to seeing you! Leila This month *The people's classifier: Towards an open model for algorithmic infrastructure* By Aaron Halfaker <https://www.mediawiki.org/wiki/User:Halfak_(WMF)> Recent research has implicated that Wikipedia's algorithmic infrastructure is perpetuating social issues. However, these same algorithmic tools are critical to maintaining efficiency of open projects like Wikipedia at scale. But rather than simply critiquing algorithmic wiki-tools and calling for less algorithmic infrastructure, I'll propose a different strategy -- an open approach to building this algorithmic infrastructure. In this presentation, I'll demo a set of services that are designed to open a critical part Wikipedia's quality control infrastructure -- machine classifiers. I'll also discuss how this strategy unites critical/feminist HCI with more dominant narratives about efficiency and productivity. *Social transparency online* By Jennifer Marlow <http://www.aboutjmarlow.com/> and Laura Dabbish <http://www.lauradabbish.com/> An emerging Internet trend is greater social transparency, such as the use of real names in social networking sites, feeds of friends' activities, traces of others' re-use of content, and visualizations of team interactions. There is a potential for this transparency to radically improve coordination, particularly in open collaboration settings like Wikipedia. In this talk, we will describe some of our research identifying how transparency influences collaborative performance in online work environments. First, we have been studying professional social networking communities. Social media allows individuals in these communities to create an interest network of people and digital artifacts, and get moment-by-moment updates about actions by those people or changes to those artifacts. It affords and unprecedented level of transparency about the actions of others over time. We will describe qualitative work examining how members of these communities use transparency to accomplish their goals. Second, we have been looking at the impact of making workflows transparent. In a series of field experiments we are investigating how socially transparent interfaces, and activity trace information in particular, influence perceptions and behavior towards others and evaluations of their work.

2 1

Re: [Wiki-research-l] Wiki-research-l Digest, Vol 117, Issue 17
by Alex Druk 12 May '15

12 May '15

Thank you, Federico! Your link to phabricator explain this. However, it would be nice if such changes will be described in Read.me file Alex On Tue, May 12, 2015 at 2:00 PM, < wiki-research-l-request(a)lists.wikimedia.org> wrote: > Send Wiki-research-l mailing list submissions to > wiki-research-l(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > or, via email, send a message with subject or body 'help' to > wiki-research-l-request(a)lists.wikimedia.org > > You can reach the person managing the list at > wiki-research-l-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wiki-research-l digest..." > > > Today's Topics: > > 1. Re: How to explain drop in random searches (Daniel Moyer) > 2. Re: How to explain drop in random searches (Federico Leva (Nemo)) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 11 May 2015 23:13:15 -0700 > From: Daniel Moyer <moyerd(a)usc.edu> > To: Research into Wikimedia content and communities > <wiki-research-l(a)lists.wikimedia.org> > Subject: Re: [Wiki-research-l] How to explain drop in random searches > Message-ID: > <CAKvQcvXcMXSc2SkDVJTTbs2MXuCSpeHcHeSd= > gWkg6bwY8DqjQ(a)mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > A lot of thanks and credit to the analytics team for keeping these counts > running. > > That being said, it might be a good idea not to draw too many conclusions > from the pageview counts on user behaviour without a closer analysis, > especially for the Special:* pages. As demonstrated by the October 16th > drop, these are strongly affected by instrument bias. > > On Mon, May 11, 2015 at 10:56 PM, Alex Druk <alex.druk(a)gmail.com> wrote: > > > Because similar patterns are observed for many other languages (but not > > all), it looks like R.Stuart Geiger explanation is correct: from October > > 16 2014 Special:Random page is just not counted any more (with some not > > clear exceptions). > > > > That’s a pity because we lost valuable source of info how Wikipedia users > > look for information. Random search was (and is?) a major way users > explore > > Wikipedias. In many languages Special:Random was significantly higher > than > > Main_Page count and certainly higher than search with index.php. > > > > (I do not want to point finger, but maybe somebody at WMF considered this > > emotionally.) > > > > IMHO, logs should be logs and log actual activity. At least such > dramatic > > changes in logging user’s activity should be documented somewhere. > Betters > > in Read.me file that should accompany raw logs. > > > > >Date: Mon, 11 May 2015 20:08:40 -0700 > > >From: "R.Stuart Geiger" <sgeiger(a)gmail.com> > > >To: Research into Wikimedia content and communities > > > <wiki-research-l(a)lists.wikimedia.org> > > >Subject: Re: [Wiki-research-l] Wiki-research-l Digest, Vol 117, Issue > > > 14 > > >Message-ID: > > > <CAKt0Q=e-_=0= > > aepKeSVnT0Ce2FmJZu5bNtpNnYwZV7x21A3tXQ(a)mail.gmail.com> > > >Content-Type: text/plain; charset="utf-8" > > > > > >Going from 86,000,000 a month to 31,000 a month is quite a drop, and the > > >shift is pretty dramatic. It goes from 1.7 million one day to 715 the > next > > >and stays flat (http://stats.grok.se/en/201410/Special:Random). > > > > > >I was also thinking there could be a bot or something that is scraping > > >Special:Random, but the drop also happens for Special:Random/Talk -- > which > > >hardly anybody uses, but it still drops flat the same day ( > > >http://stats.grok.se/en/201410/Special:Random/Talk). It doesn't happen > > for > > >Special:Upload or Special:Log though. > > > > > >October 16th, 2014 is the day it changes. Anybody know of something that > > >might have changed that day with logging? Also, there have to be way > more > > >than ~1,000 hits a day to Special:Random. Perhaps pageviews started to > be > > >counted for the page that it got redirected to, rather than the > > >Special:Random page itself. But then why wouldn't it go to 0? What are > > >those ~1,000 hits a day? > > > > > >[image: 👻] ~~ it is a mystery ~~ [image: 👻] > > > > > > -- > > Thank you. > > > > Alex Druk > > alex.druk(a)gmail.com > > www.wikipediatrends.com > > (775) 237-8550 Google voice > > > > _______________________________________________ > > Wiki-research-l mailing list > > Wiki-research-l(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > >

1 0

Re: [Wiki-research-l] How to explain drop in random searches
by Alex Druk 12 May '15

12 May '15

Because similar patterns are observed for many other languages (but not all), it looks like R.Stuart Geiger explanation is correct: from October 16 2014 Special:Random page is just not counted any more (with some not clear exceptions). That’s a pity because we lost valuable source of info how Wikipedia users look for information. Random search was (and is?) a major way users explore Wikipedias. In many languages Special:Random was significantly higher than Main_Page count and certainly higher than search with index.php. (I do not want to point finger, but maybe somebody at WMF considered this emotionally.) IMHO, logs should be logs and log actual activity. At least such dramatic changes in logging user’s activity should be documented somewhere. Betters in Read.me file that should accompany raw logs. >Date: Mon, 11 May 2015 20:08:40 -0700 >From: "R.Stuart Geiger" <sgeiger(a)gmail.com> >To: Research into Wikimedia content and communities > <wiki-research-l(a)lists.wikimedia.org> >Subject: Re: [Wiki-research-l] Wiki-research-l Digest, Vol 117, Issue > 14 >Message-ID: > <CAKt0Q=e-_=0=aepKeSVnT0Ce2FmJZu5bNtpNnYwZV7x21A3tXQ(a)mail.gmail.com > >Content-Type: text/plain; charset="utf-8" > >Going from 86,000,000 a month to 31,000 a month is quite a drop, and the >shift is pretty dramatic. It goes from 1.7 million one day to 715 the next >and stays flat (http://stats.grok.se/en/201410/Special:Random). > >I was also thinking there could be a bot or something that is scraping >Special:Random, but the drop also happens for Special:Random/Talk -- which >hardly anybody uses, but it still drops flat the same day ( >http://stats.grok.se/en/201410/Special:Random/Talk). It doesn't happen for >Special:Upload or Special:Log though. > >October 16th, 2014 is the day it changes. Anybody know of something that >might have changed that day with logging? Also, there have to be way more >than ~1,000 hits a day to Special:Random. Perhaps pageviews started to be >counted for the page that it got redirected to, rather than the >Special:Random page itself. But then why wouldn't it go to 0? What are >those ~1,000 hits a day? > >👻 ~~ it is a mystery ~~ 👻 -- Thank you. Alex Druk alex.druk(a)gmail.com www.wikipediatrends.com (775) 237-8550 Google voice

4 3

Re: [Wiki-research-l] Wiki-research-l Digest, Vol 117, Issue 14
by Alex Druk 12 May '15

12 May '15

I just grep monthly totals from Erik Zachte http://dumps.wikimedia.org/other/pagecounts-ez/merged/ (grep "^en.z Special:Random ") On Mon, May 11, 2015 at 2:00 PM, < wiki-research-l-request(a)lists.wikimedia.org> wrote: > Send Wiki-research-l mailing list submissions to > wiki-research-l(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > or, via email, send a message with subject or body 'help' to > wiki-research-l-request(a)lists.wikimedia.org > > You can reach the person managing the list at > wiki-research-l-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wiki-research-l digest..." > > > Today's Topics: > > 1. Re: How to explain drop in random searches (Oliver Keyes) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 10 May 2015 08:30:37 -0400 > From: Oliver Keyes <okeyes(a)wikimedia.org> > To: Research into Wikimedia content and communities > <wiki-research-l(a)lists.wikimedia.org> > Subject: Re: [Wiki-research-l] How to explain drop in random searches > Message-ID: > < > CAAUQgdA6jVzgs3QQXVVgsh7MFthWxpD97uisDTJmjNUgZZXH5A(a)mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Using what data? > > On 10 May 2015 at 05:29, Alex Druk <alex.druk(a)gmail.com> wrote: > > Hi everyone, > > > > > > > > I try to learn dynamic of random searches (Special:Random) on English > > Wikipedia. > > > > From 01/2012 to 10/2014 average number of random searches per month was > > about 86 millions or about 30% of Main_Page pageviews, but from November > > 2014 it drop to 31,000 per month (or 0.008% of Main_page). > > > > How to explain such a dramatic drop? Any ideas? > > > > > > -- > > Thank you. > > > > Alex Druk, PhD > > wikipediatrends.com > > alex.druk(a)gmail.com > > (775) 237-8550 Google voice > > > > _______________________________________________ > > Wiki-research-l mailing list > > Wiki-research-l(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > > > ------------------------------ > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > End of Wiki-research-l Digest, Vol 117, Issue 14 > ************************************************ > -- Thank you. Alex Druk alex.druk(a)gmail.com (775) 237-8550 Google voice

3 3

How to explain drop in random searches
by Alex Druk 10 May '15

10 May '15

Hi everyone, I try to learn dynamic of random searches (Special:Random) on English Wikipedia. >From 01/2012 to 10/2014 average number of random searches per month was about 86 millions or about 30% of Main_Page pageviews, but from November 2014 it drop to 31,000 per month (or 0.008% of Main_page). How to explain such a dramatic drop? Any ideas? -- Thank you. Alex Druk, PhD wikipediatrends.com alex.druk(a)gmail.com (775) 237-8550 Google voice

2 1

Fwd: Traffic to the portal from Zero providers
by Oliver Keyes 07 May '15

07 May '15

Cross-posting to research and analytics, too! ---------- Forwarded message ---------- From: Oliver Keyes <okeyes(a)wikimedia.org> Date: 6 May 2015 at 13:11 Subject: Traffic to the portal from Zero providers To: wikimedia-search(a)lists.wikimedia.org Hey all, (Throwing this to the public list, because transparency is Good) I recently did a presentation on a traffic analysis to the Wikipedia "home page" - www.wikipedia.org.[1] One of the biggest visualisations, in impact terms, showed that a lot of portal traffic - far more, proportionately, than traffic to Wikipedia overall - is coming from India and Brazil.[2] One of the hypotheses was that this could be Zero traffic. I've done a basic analysis of the traffic, looking specifically at the zero headers,[3] and this hypothesis turns out to be incorrect - almost no zero traffic is hitting the portal. The traffic we're seeing from Brazil and India is not zero-based. This makes a lot of sense (the reason mobile traffic redirects to the enwiki home page from the portal is the Zero extension, so presumably this happens specifically to Zero traffic) but it does mean that our null hypothesis - that this traffic is down to ISP-level or device-level design choices and links - is more likely to be correct. [1] http://ironholds.org/misc/homepage_presentation.html [2] http://ironholds.org/misc/homepage_presentation.html#/11 [3] https://phabricator.wikimedia.org/T98076 -- Oliver Keyes Research Analyst Wikimedia Foundation -- Oliver Keyes Research Analyst Wikimedia Foundation

9 22

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l May 2015