Cool. In that case, I will generate a dump for all the data we have, report back when done, and if there are no issues with releasing it, tarball it up and put it on figshare :)
On 15 April 2015 at 13:27, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
thanks, both. Let's go ahead with English only and no spiders filtered or mobile/desktop breakdown, per Oliver.
Michelle – given the aggregation level I am fine moving forward with this release, but let me know off-thread if you have any questions.
Dario
On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Dario,
No spider filtering, and no split between mobile and desktop; mobile and desktop are grouped.
On 15 April 2015 at 12:46, Hirav Gandhi hirav.gandhi@gmail.com wrote:
e.g. German*
I need more coffee.
On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi hirav.gandhi@gmail.com wrote:
Dario - we just want a representative samples of traffic for a popular site like Wikipedia. We thought limiting to the English Wikipedia would be easier.
If we get aggregated data across all language Wikipedia sites, we would need someway to tease out which language is being queried when. Some languages (for e.g. German) we would hypothesize would have more daily seasonality than languages like English.
On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
Hirav, Bharath – I also want to hear from you if there's a specific reason to ask for English Wikipedia only or if a dataset encompassing aggregate pageviews across all Wikimedia properties would do the job.
Dario
On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
Oliver -- thanks for running a preliminary check, I'm fine releasing this data in aggregate under CC0, I believe it would be valuable for this and other research projects (copying Michelle from Legal).
Before we do so, though, I want to confirm the specs: aggregate pageviews per second to English Wikipedia, excluding bot traffic, broken down by access method (mobile web vs desktop site, not apps) for a 60-day period. Oliver – are these the filters you used to identify the data point with the smallest number of observations?
Obviously, we will need to take into account this release when we start working on projects such as
https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits and
https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pagevi...
Dario
On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes okeyes@wikimedia.org wrote: > > Bumping for Dario, per Pine's excellent example :) > > On 13 April 2015 at 22:18, Hirav Gandhi hirav.gandhi@gmail.com > wrote: > > Oliver: Two months is fine. Thank you so much for your help! > > > >> On Apr 13, 2015, at 4:40 PM, > >> analytics-request@lists.wikimedia.org > >> wrote: > >> > >> Send Analytics mailing list submissions to > >> analytics@lists.wikimedia.org > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> or, via email, send a message with subject or body 'help' to > >> analytics-request@lists.wikimedia.org > >> > >> You can reach the person managing the list at > >> analytics-owner@lists.wikimedia.org > >> > >> When replying, please edit your Subject line so it is more > >> specific > >> than "Re: Contents of Analytics digest..." > >> > >> > >> Today's Topics: > >> > >> 1. Re: Page views on a more frequent than hourly basis (Pine W) > >> 2. Re: Page views on a more frequent than hourly basis (Hirav > >> Gandhi) > >> 3. Re: Page views on a more frequent than hourly basis (Oliver > >> Keyes) > >> > >> > >> > >> > >> ---------------------------------------------------------------------- > >> > >> Message: 1 > >> Date: Mon, 13 Apr 2015 13:34:23 -0700 > >> From: Pine W wiki.pine@gmail.com > >> To: "A mailing list for the Analytics Team at WMF and everybody > >> who > >> has an interest in Wikipedia and analytics." > >> analytics@lists.wikimedia.org > >> Subject: Re: [Analytics] Page views on a more frequent than > >> hourly > >> basis > >> Message-ID: > >> > >> > >> CAF=dyJjZMdfTHZ+0+LwnHb9m8xUOd4WetGCFUXYB9Qyf7CyC5Q@mail.gmail.com > >> Content-Type: text/plain; charset="utf-8" > >> > >> Hi Oliver, re ccing people who are on list, this is the protocol > >> we > >> followed in IEGCom to ping people who are subscribed and > >> mentioned > >> in > >> certain emails but, like many of us, may automatically move > >> emails > >> from > >> lists directly to folders where they may be unread for days. So > >> there is a > >> reason to do this. > >> > >> Thanks, > >> > >> Pine > >>