Hi Dario,

One last question - would it be possible to break it out into mobile vs desktop? We are also concerned there might be seasonality effects in there as well. Please let us know.

Best,

Hirav



On Wed, Apr 15, 2015 at 10:27 AM, Dario Taraborelli <dtaraborelli@wikimedia.org> wrote:

thanks, both. Let's go ahead with English only and no spiders filtered or mobile/desktop breakdown, per Oliver.

Michelle – given the aggregation level I am fine moving forward with this release, but let me know off-thread if you have any questions.

Dario

On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Dario,

No spider filtering, and no split between mobile and desktop; mobile
and desktop are grouped.

On 15 April 2015 at 12:46, Hirav Gandhi <hirav.gandhi@gmail.com> wrote:
> e.g. German*
>
> I need more coffee.
>
>
>
> On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi <hirav.gandhi@gmail.com>
> wrote:
>>
>> Dario - we just want a representative samples of traffic for a popular
>> site like Wikipedia. We thought limiting to the English Wikipedia would be
>> easier.
>>
>> If we get aggregated data across all language Wikipedia sites, we would
>> need someway to tease out which language is being queried when. Some
>> languages (for e.g. German) we would hypothesize would have more daily
>> seasonality than languages like English.
>>
>>
>>
>> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli
>> <dtaraborelli@wikimedia.org> wrote:
>>>
>>> Hirav, Bharath – I also want to hear from you if there's a specific
>>> reason to ask for English Wikipedia only or if a dataset encompassing
>>> aggregate pageviews across all Wikimedia properties would do the job.
>>>
>>> Dario
>>>
>>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli
>>> <dtaraborelli@wikimedia.org> wrote:
>>>>
>>>> Oliver -- thanks for running a preliminary check, I'm fine releasing
>>>> this data in aggregate under CC0, I believe it would be valuable for this
>>>> and other research projects (copying Michelle from Legal).
>>>>
>>>> Before we do so, though, I want to confirm the specs: aggregate
>>>> pageviews per second to English Wikipedia, excluding bot traffic, broken
>>>> down by access method (mobile web vs desktop site, not apps) for a 60-day
>>>> period. Oliver – are these the filters you used to identify the data point
>>>> with the smallest number of observations?
>>>>
>>>> Obviously, we will need to take into account this release when we start
>>>> working on projects such as
>>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits
>>>> and
>>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
>>>>
>>>> Dario
>>>>
>>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes <okeyes@wikimedia.org>
>>>> wrote:
>>>>>
>>>>> Bumping for Dario, per Pine's excellent example :)
>>>>>
>>>>> On 13 April 2015 at 22:18, Hirav Gandhi <hirav.gandhi@gmail.com> wrote:
>>>>> > Oliver: Two months is fine. Thank you so much for your help!
>>>>> >
>>>>> >> On Apr 13, 2015, at 4:40 PM, analytics-request@lists.wikimedia.org
>>>>> >> wrote:
>>>>> >>
>>>>> >> Send Analytics mailing list submissions to
>>>>> >>       analytics@lists.wikimedia.org
>>>>> >>
>>>>> >> To subscribe or unsubscribe via the World Wide Web, visit
>>>>> >>       https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >> or, via email, send a message with subject or body 'help' to
>>>>> >>       analytics-request@lists.wikimedia.org
>>>>> >>
>>>>> >> You can reach the person managing the list at
>>>>> >>       analytics-owner@lists.wikimedia.org
>>>>> >>
>>>>> >> When replying, please edit your Subject line so it is more specific
>>>>> >> than "Re: Contents of Analytics digest..."
>>>>> >>
>>>>> >>
>>>>> >> Today's Topics:
>>>>> >>
>>>>> >>   1. Re: Page views on a more frequent than hourly basis (Pine W)
>>>>> >>   2. Re: Page views on a more frequent than hourly basis (Hirav
>>>>> >> Gandhi)
>>>>> >>   3. Re: Page views on a more frequent than hourly basis (Oliver
>>>>> >> Keyes)
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> ----------------------------------------------------------------------
>>>>> >>
>>>>> >> Message: 1
>>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700
>>>>> >> From: Pine W <wiki.pine@gmail.com>
>>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody who
>>>>> >>       has an  interest in Wikipedia and analytics."
>>>>> >>       <analytics@lists.wikimedia.org>
>>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
>>>>> >>       basis
>>>>> >> Message-ID:
>>>>> >>
>>>>> >> <CAF=dyJjZMdfTHZ+0+LwnHb9m8xUOd4WetGCFUXYB9Qyf7CyC5Q@mail.gmail.com>
>>>>> >> Content-Type: text/plain; charset="utf-8"
>>>>> >>
>>>>> >> Hi Oliver, re ccing people who are on list, this is the protocol we
>>>>> >> followed in IEGCom to ping people who are subscribed and mentioned
>>>>> >> in
>>>>> >> certain emails but, like many of us, may automatically move emails
>>>>> >> from
>>>>> >> lists directly to folders where they may be unread for days. So
>>>>> >> there is a
>>>>> >> reason to do this.
>>>>> >>
>>>>> >> Thanks,
>>>>> >>
>>>>> >> Pine
>>>>> >> -------------- next part --------------
>>>>> >> An HTML attachment was scrubbed...
>>>>> >> URL:
>>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html>
>>>>> >>
>>>>> >> ------------------------------
>>>>> >>
>>>>> >> Message: 2
>>>>> >> Date: Mon, 13 Apr 2015 16:30:43 -0700
>>>>> >> From: Hirav Gandhi <hirav.gandhi@gmail.com>
>>>>> >> To: analytics@lists.wikimedia.org
>>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
>>>>> >>       basis
>>>>> >> Message-ID:
>>>>> >>
>>>>> >> <CANzC_EOvi4MP7G_SsxvW=UOjPt2vXbNfMHcipqN1pumACE-eEw@mail.gmail.com>
>>>>> >> Content-Type: text/plain; charset="utf-8"
>>>>> >>
>>>>> >> Thanks Oliver!
>>>>> >>
>>>>> >> We would like this data for as broad of a time period as you can
>>>>> >> muster.
>>>>> >> The more days, months and year represented in the dataset, the
>>>>> >> better.
>>>>> >>
>>>>> >>
>>>>> >>> Okay, so:
>>>>> >>>
>>>>> >>> I took an hour from the pageviews logs,[0] and aggregated pageviews
>>>>> >>> to
>>>>> >>> enwiki (mobile and desktop both) by timestamp, down to one-second
>>>>> >>> resolution levels. The lowest number of pageviews to enwiki per
>>>>> >>> second
>>>>> >>> was 2,981
>>>>> >>>
>>>>> >>> So, I don't personally have a problem with generating a release of:
>>>>> >>>
>>>>> >>> 1. Pageviews per second;
>>>>> >>> 2. To enwiki;
>>>>> >>> 3. Over $TIME_PERIOD;
>>>>> >>> 4. grouping the mobile and desktop site
>>>>> >>>
>>>>> >>> But Dario or someone should chip in before I touch anything ;p
>>>>> >>>
>>>>> >>> 6am yesterday. 6am because it should be low-traffic, right? At
>>>>> >>> least
>>>>> >>> given our biases towards north america and europe
>>>>> >>>
>>>>> >>> On 13 April 2015 at 11:54, Oliver Keyes <okeyes@wikimedia.org>
>>>>> >>> wrote:
>>>>> >>>> Then that sounds much more viable. I'll run a quick test now to
>>>>> >>>> see
>>>>> >>>> how much clustering we'd see at, say, the one-second resolution
>>>>> >>>> level,
>>>>> >>>> and throw it out here so we can make more informed decisions about
>>>>> >>>> a
>>>>> >>>> data release on this.
>>>>> >>>>
>>>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gandhi@gmail.com>
>>>>> >>>> wrote:
>>>>> >>>>> Hi Oliver,
>>>>> >>>>>
>>>>> >>>>> Re: Hirav: would you be looking for temporally /and/ contextually
>>>>> >>> granular
>>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just temporally
>>>>> >>> granular,
>>>>> >>>>> so "a view to a page on enwiki at X time"? If the latter you've
>>>>> >>>>> got
>>>>> >>> more of
>>>>> >>>>> a shot, I suspect.
>>>>> >>>>>
>>>>> >>>>> I only want the latter - I am not concerned with the context so
>>>>> >>>>> much as
>>>>> >>> just
>>>>> >>>>> “a view to a page on enwiki at X time.”
>>>>> >>>>>
>>>>> >>>>> Hirav
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On Apr 13, 2015, at 5:00 AM,
>>>>> >>>>> analytics-request@lists.wikimedia.org
>>>>> >>> wrote:
>>>>> >>>>>
>>>>> >>>>> Send Analytics mailing list submissions to
>>>>> >>>>> analytics@lists.wikimedia.org
>>>>> >>>>>
>>>>> >>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>> or, via email, send a message with subject or body 'help' to
>>>>> >>>>> analytics-request@lists.wikimedia.org
>>>>> >>>>>
>>>>> >>>>> You can reach the person managing the list at
>>>>> >>>>> analytics-owner@lists.wikimedia.org
>>>>> >>>>>
>>>>> >>>>> When replying, please edit your Subject line so it is more
>>>>> >>>>> specific
>>>>> >>>>> than "Re: Contents of Analytics digest..."
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> Today's Topics:
>>>>> >>>>>
>>>>> >>>>>  1. Re: Page views on a more frequent than hourly basis (Pine W)
>>>>> >>>>>  2. Re: Page views on a more frequent than hourly basis (Oliver
>>>>> >>>>> Keyes)
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> ----------------------------------------------------------------------
>>>>> >>>>>
>>>>> >>>>> Message: 1
>>>>> >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>>>>> >>>>> From: Pine W <wiki.pine@gmail.com>
>>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody
>>>>> >>>>> who
>>>>> >>>>> has an interest in Wikipedia and analytics."
>>>>> >>>>> <analytics@lists.wikimedia.org>
>>>>> >>>>> Cc: Bharath Sitaraman <bharath1028@gmail.com>
>>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than
>>>>> >>>>> hourly
>>>>> >>>>> basis
>>>>> >>>>> Message-ID:
>>>>> >>>>>
>>>>> >>>>> <CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ@mail.gmail.com>
>>>>> >>>>> Content-Type: text/plain; charset="utf-8"
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> Hi,
>>>>> >>>>>
>>>>> >>>>> This issue of pageview data granularity has been discussed
>>>>> >>>>> before, and
>>>>> >>> the
>>>>> >>>>> answer has been that hourly is the smallest increment allowed to
>>>>> >>>>> be
>>>>> >>>>> revealed publicly, for privacy reasons.
>>>>> >>>>>
>>>>> >>>>> I believe that the person you will want to discuss your request
>>>>> >>>>> with is
>>>>> >>>>> Toby, who I have cc'd here.
>>>>> >>>>>
>>>>> >>>>> Pine
>>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gandhi@gmail.com>
>>>>> >>> wrote:
>>>>> >>>>>
>>>>> >>>>> Hi Wikimedia Analytics Team,
>>>>> >>>>>
>>>>> >>>>> My colleague Bharath and I are doing research on dynamic server
>>>>> >>> allocation
>>>>> >>>>> algorithms and we were looking for a suitable datasets to test
>>>>> >>>>> our
>>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an amazing
>>>>> >>>>> data
>>>>> >>> set
>>>>> >>>>> of hourly page views, but we were looking for something a bit
>>>>> >>>>> more
>>>>> >>>>> granular, such as aggregated page requests to English Wikipedia
>>>>> >>>>> on a
>>>>> >>> minute
>>>>> >>>>> by minute basis or second by second basis if possible.
>>>>> >>>>>
>>>>> >>>>> We are more than happy to pour through any raw data you might
>>>>> >>>>> have that
>>>>> >>>>> would help us calculate page requests at this granular level.
>>>>> >>>>> Please
>>>>> >>> let us
>>>>> >>>>> know if it would be possible to get such data and if so how.
>>>>> >>>>> Thank you
>>>>> >>> in
>>>>> >>>>> advance for your help.
>>>>> >>>>>
>>>>> >>>>> Best,
>>>>> >>>>>
>>>>> >>>>> Hirav Gandhi
>>>>> >>>>> _______________________________________________
>>>>> >>>>> Analytics mailing list
>>>>> >>>>> Analytics@lists.wikimedia.org
>>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>
>>>>> >>>>> -------------- next part --------------
>>>>> >>>>> An HTML attachment was scrubbed...
>>>>> >>>>> URL:
>>>>> >>>>> <
>>>>> >>>
>>>>> >>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html
>>>>> >>>>
>>>>> >>>>>
>>>>> >>>>> ------------------------------
>>>>> >>>>>
>>>>> >>>>> Message: 2
>>>>> >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>>>>> >>>>> From: Oliver Keyes <okeyes@wikimedia.org>
>>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody
>>>>> >>>>> who
>>>>> >>>>> has an interest in Wikipedia and analytics."
>>>>> >>>>> <analytics@lists.wikimedia.org>
>>>>> >>>>> Cc: Bharath Sitaraman <bharath1028@gmail.com>
>>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than
>>>>> >>>>> hourly
>>>>> >>>>> basis
>>>>> >>>>> Message-ID:
>>>>> >>>>>
>>>>> >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=hxPg@mail.gmail.com>
>>>>> >>>>> Content-Type: text/plain; charset=UTF-8
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine. He's the
>>>>> >>>>> director of analytics.
>>>>> >>>>>
>>>>> >>>>> Hirav: would you be looking for temporally /and/ contextually
>>>>> >>>>> granular
>>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just temporally
>>>>> >>>>> granular, so "a view to a page on enwiki at X time"? If the
>>>>> >>>>> latter
>>>>> >>>>> you've got more of a shot, I suspect.
>>>>> >>>>>
>>>>> >>>>> On 13 April 2015 at 03:47, Pine W <wiki.pine@gmail.com> wrote:
>>>>> >>>>>
>>>>> >>>>> Hi,
>>>>> >>>>>
>>>>> >>>>> This issue of pageview data granularity has been discussed
>>>>> >>>>> before, and
>>>>> >>> the
>>>>> >>>>> answer has been that hourly is the smallest increment allowed to
>>>>> >>>>> be
>>>>> >>> revealed
>>>>> >>>>> publicly, for privacy reasons.
>>>>> >>>>>
>>>>> >>>>> I believe that the person you will want to discuss your request
>>>>> >>>>> with is
>>>>> >>>>> Toby, who I have cc'd here.
>>>>> >>>>>
>>>>> >>>>> Pine
>>>>> >>>>>
>>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gandhi@gmail.com>
>>>>> >>> wrote:
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> Hi Wikimedia Analytics Team,
>>>>> >>>>>
>>>>> >>>>> My colleague Bharath and I are doing research on dynamic server
>>>>> >>> allocation
>>>>> >>>>> algorithms and we were looking for a suitable datasets to test
>>>>> >>>>> our
>>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an amazing
>>>>> >>>>> data
>>>>> >>> set
>>>>> >>>>> of hourly page views, but we were looking for something a bit
>>>>> >>>>> more
>>>>> >>> granular,
>>>>> >>>>> such as aggregated page requests to English Wikipedia on a minute
>>>>> >>>>> by
>>>>> >>> minute
>>>>> >>>>> basis or second by second basis if possible.
>>>>> >>>>>
>>>>> >>>>> We are more than happy to pour through any raw data you might
>>>>> >>>>> have that
>>>>> >>>>> would help us calculate page requests at this granular level.
>>>>> >>>>> Please
>>>>> >>> let us
>>>>> >>>>> know if it would be possible to get such data and if so how.
>>>>> >>>>> Thank you
>>>>> >>> in
>>>>> >>>>> advance for your help.
>>>>> >>>>>
>>>>> >>>>> Best,
>>>>> >>>>>
>>>>> >>>>> Hirav Gandhi
>>>>> >>>>> _______________________________________________
>>>>> >>>>> Analytics mailing list
>>>>> >>>>> Analytics@lists.wikimedia.org
>>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> _______________________________________________
>>>>> >>>>> Analytics mailing list
>>>>> >>>>> Analytics@lists.wikimedia.org
>>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> Oliver Keyes
>>>>> >>>>> Research Analyst
>>>>> >>>>> Wikimedia Foundation
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> ------------------------------
>>>>> >>>>>
>>>>> >>>>> _______________________________________________
>>>>> >>>>> Analytics mailing list
>>>>> >>>>> Analytics@lists.wikimedia.org
>>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21
>>>>> >>>>> *****************************************
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> _______________________________________________
>>>>> >>>>> Analytics mailing list
>>>>> >>>>> Analytics@lists.wikimedia.org
>>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> Oliver Keyes
>>>>> >>>> Research Analyst
>>>>> >>>> Wikimedia Foundation
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> Oliver Keyes
>>>>> >>> Research Analyst
>>>>> >>> Wikimedia Foundation
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> ------------------------------
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> Analytics mailing list
>>>>> >>> Analytics@lists.wikimedia.org
>>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>
>>>>> >> -------------- next part --------------
>>>>> >> An HTML attachment was scrubbed...
>>>>> >> URL:
>>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html>
>>>>> >>
>>>>> >> ------------------------------
>>>>> >>
>>>>> >> Message: 3
>>>>> >> Date: Mon, 13 Apr 2015 19:40:04 -0400
>>>>> >> From: Oliver Keyes <okeyes@wikimedia.org>
>>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody who
>>>>> >>       has an  interest in Wikipedia and analytics."
>>>>> >>       <analytics@lists.wikimedia.org>
>>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
>>>>> >>       basis
>>>>> >> Message-ID:
>>>>> >>
>>>>> >> <CAAUQgdD6Z5USsu11VW49fDMBSrhYEjxKU9yOPySEriB79J-5Cg@mail.gmail.com>
>>>>> >> Content-Type: text/plain; charset=UTF-8
>>>>> >>
>>>>> >> ....
>>>>> >>
>>>>> >>
>>>>> >> ...years?
>>>>> >>
>>>>> >> We have unsampled logs for, ah. 2 months.
>>>>> >>
>>>>> >> On 13 April 2015 at 19:30, Hirav Gandhi <hirav.gandhi@gmail.com>
>>>>> >> wrote:
>>>>> >>> Thanks Oliver!
>>>>> >>>
>>>>> >>> We would like this data for as broad of a time period as you can
>>>>> >>> muster. The
>>>>> >>> more days, months and year represented in the dataset, the better.
>>>>> >>>
>>>>> >>>>
>>>>> >>>> Okay, so:
>>>>> >>>>
>>>>> >>>> I took an hour from the pageviews logs,[0] and aggregated
>>>>> >>>> pageviews to
>>>>> >>>> enwiki (mobile and desktop both) by timestamp, down to one-second
>>>>> >>>> resolution levels. The lowest number of pageviews to enwiki per
>>>>> >>>> second
>>>>> >>>> was 2,981
>>>>> >>>>
>>>>> >>>> So, I don't personally have a problem with generating a release
>>>>> >>>> of:
>>>>> >>>>
>>>>> >>>> 1. Pageviews per second;
>>>>> >>>> 2. To enwiki;
>>>>> >>>> 3. Over $TIME_PERIOD;
>>>>> >>>> 4. grouping the mobile and desktop site
>>>>> >>>>
>>>>> >>>> But Dario or someone should chip in before I touch anything ;p
>>>>> >>>>
>>>>> >>>> 6am yesterday. 6am because it should be low-traffic, right? At
>>>>> >>>> least
>>>>> >>>> given our biases towards north america and europe
>>>>> >>>>
>>>>> >>>> On 13 April 2015 at 11:54, Oliver Keyes <okeyes@wikimedia.org>
>>>>> >>>> wrote:
>>>>> >>>>> Then that sounds much more viable. I'll run a quick test now to
>>>>> >>>>> see
>>>>> >>>>> how much clustering we'd see at, say, the one-second resolution
>>>>> >>>>> level,
>>>>> >>>>> and throw it out here so we can make more informed decisions
>>>>> >>>>> about a
>>>>> >>>>> data release on this.
>>>>> >>>>>
>>>>> >>>>> On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gandhi@gmail.com>
>>>>> >>>>> wrote:
>>>>> >>>>>> Hi Oliver,
>>>>> >>>>>>
>>>>> >>>>>> Re: Hirav: would you be looking for temporally /and/
>>>>> >>>>>> contextually
>>>>> >>>>>> granular
>>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just temporally
>>>>> >>>>>> granular,
>>>>> >>>>>> so "a view to a page on enwiki at X time"? If the latter you've
>>>>> >>>>>> got
>>>>> >>>>>> more of
>>>>> >>>>>> a shot, I suspect.
>>>>> >>>>>>
>>>>> >>>>>> I only want the latter - I am not concerned with the context so
>>>>> >>>>>> much as
>>>>> >>>>>> just
>>>>> >>>>>> “a view to a page on enwiki at X time.”
>>>>> >>>>>>
>>>>> >>>>>> Hirav
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Apr 13, 2015, at 5:00 AM,
>>>>> >>>>>> analytics-request@lists.wikimedia.org
>>>>> >>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Send Analytics mailing list submissions to
>>>>> >>>>>> analytics@lists.wikimedia.org
>>>>> >>>>>>
>>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>> or, via email, send a message with subject or body 'help' to
>>>>> >>>>>> analytics-request@lists.wikimedia.org
>>>>> >>>>>>
>>>>> >>>>>> You can reach the person managing the list at
>>>>> >>>>>> analytics-owner@lists.wikimedia.org
>>>>> >>>>>>
>>>>> >>>>>> When replying, please edit your Subject line so it is more
>>>>> >>>>>> specific
>>>>> >>>>>> than "Re: Contents of Analytics digest..."
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Today's Topics:
>>>>> >>>>>>
>>>>> >>>>>>  1. Re: Page views on a more frequent than hourly basis (Pine W)
>>>>> >>>>>>  2. Re: Page views on a more frequent than hourly basis (Oliver
>>>>> >>>>>> Keyes)
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> ----------------------------------------------------------------------
>>>>> >>>>>>
>>>>> >>>>>> Message: 1
>>>>> >>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>>>>> >>>>>> From: Pine W <wiki.pine@gmail.com>
>>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and everybody
>>>>> >>>>>> who
>>>>> >>>>>> has an interest in Wikipedia and analytics."
>>>>> >>>>>> <analytics@lists.wikimedia.org>
>>>>> >>>>>> Cc: Bharath Sitaraman <bharath1028@gmail.com>
>>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than
>>>>> >>>>>> hourly
>>>>> >>>>>> basis
>>>>> >>>>>> Message-ID:
>>>>> >>>>>>
>>>>> >>>>>> <CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ@mail.gmail.com>
>>>>> >>>>>> Content-Type: text/plain; charset="utf-8"
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Hi,
>>>>> >>>>>>
>>>>> >>>>>> This issue of pageview data granularity has been discussed
>>>>> >>>>>> before, and
>>>>> >>>>>> the
>>>>> >>>>>> answer has been that hourly is the smallest increment allowed to
>>>>> >>>>>> be
>>>>> >>>>>> revealed publicly, for privacy reasons.
>>>>> >>>>>>
>>>>> >>>>>> I believe that the person you will want to discuss your request
>>>>> >>>>>> with is
>>>>> >>>>>> Toby, who I have cc'd here.
>>>>> >>>>>>
>>>>> >>>>>> Pine
>>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>>>>> >>>>>> <hirav.gandhi@gmail.com>
>>>>> >>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Hi Wikimedia Analytics Team,
>>>>> >>>>>>
>>>>> >>>>>> My colleague Bharath and I are doing research on dynamic server
>>>>> >>>>>> allocation
>>>>> >>>>>> algorithms and we were looking for a suitable datasets to test
>>>>> >>>>>> our
>>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an
>>>>> >>>>>> amazing data
>>>>> >>>>>> set
>>>>> >>>>>> of hourly page views, but we were looking for something a bit
>>>>> >>>>>> more
>>>>> >>>>>> granular, such as aggregated page requests to English Wikipedia
>>>>> >>>>>> on a
>>>>> >>>>>> minute
>>>>> >>>>>> by minute basis or second by second basis if possible.
>>>>> >>>>>>
>>>>> >>>>>> We are more than happy to pour through any raw data you might
>>>>> >>>>>> have that
>>>>> >>>>>> would help us calculate page requests at this granular level.
>>>>> >>>>>> Please
>>>>> >>>>>> let us
>>>>> >>>>>> know if it would be possible to get such data and if so how.
>>>>> >>>>>> Thank you
>>>>> >>>>>> in
>>>>> >>>>>> advance for your help.
>>>>> >>>>>>
>>>>> >>>>>> Best,
>>>>> >>>>>>
>>>>> >>>>>> Hirav Gandhi
>>>>> >>>>>> _______________________________________________
>>>>> >>>>>> Analytics mailing list
>>>>> >>>>>> Analytics@lists.wikimedia.org
>>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>>
>>>>> >>>>>> -------------- next part --------------
>>>>> >>>>>> An HTML attachment was scrubbed...
>>>>> >>>>>> URL:
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html>
>>>>> >>>>>>
>>>>> >>>>>> ------------------------------
>>>>> >>>>>>
>>>>> >>>>>> Message: 2
>>>>> >>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>>>>> >>>>>> From: Oliver Keyes <okeyes@wikimedia.org>
>>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and everybody
>>>>> >>>>>> who
>>>>> >>>>>> has an interest in Wikipedia and analytics."
>>>>> >>>>>> <analytics@lists.wikimedia.org>
>>>>> >>>>>> Cc: Bharath Sitaraman <bharath1028@gmail.com>
>>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than
>>>>> >>>>>> hourly
>>>>> >>>>>> basis
>>>>> >>>>>> Message-ID:
>>>>> >>>>>>
>>>>> >>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=hxPg@mail.gmail.com>
>>>>> >>>>>> Content-Type: text/plain; charset=UTF-8
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Preeetty sure that Toby is on the analytics list, Pine. He's the
>>>>> >>>>>> director of analytics.
>>>>> >>>>>>
>>>>> >>>>>> Hirav: would you be looking for temporally /and/ contextually
>>>>> >>>>>> granular
>>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just temporally
>>>>> >>>>>> granular, so "a view to a page on enwiki at X time"? If the
>>>>> >>>>>> latter
>>>>> >>>>>> you've got more of a shot, I suspect.
>>>>> >>>>>>
>>>>> >>>>>> On 13 April 2015 at 03:47, Pine W <wiki.pine@gmail.com> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Hi,
>>>>> >>>>>>
>>>>> >>>>>> This issue of pageview data granularity has been discussed
>>>>> >>>>>> before, and
>>>>> >>>>>> the
>>>>> >>>>>> answer has been that hourly is the smallest increment allowed to
>>>>> >>>>>> be
>>>>> >>>>>> revealed
>>>>> >>>>>> publicly, for privacy reasons.
>>>>> >>>>>>
>>>>> >>>>>> I believe that the person you will want to discuss your request
>>>>> >>>>>> with is
>>>>> >>>>>> Toby, who I have cc'd here.
>>>>> >>>>>>
>>>>> >>>>>> Pine
>>>>> >>>>>>
>>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>>>>> >>>>>> <hirav.gandhi@gmail.com>
>>>>> >>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Hi Wikimedia Analytics Team,
>>>>> >>>>>>
>>>>> >>>>>> My colleague Bharath and I are doing research on dynamic server
>>>>> >>>>>> allocation
>>>>> >>>>>> algorithms and we were looking for a suitable datasets to test
>>>>> >>>>>> our
>>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an
>>>>> >>>>>> amazing data
>>>>> >>>>>> set
>>>>> >>>>>> of hourly page views, but we were looking for something a bit
>>>>> >>>>>> more
>>>>> >>>>>> granular,
>>>>> >>>>>> such as aggregated page requests to English Wikipedia on a
>>>>> >>>>>> minute by
>>>>> >>>>>> minute
>>>>> >>>>>> basis or second by second basis if possible.
>>>>> >>>>>>
>>>>> >>>>>> We are more than happy to pour through any raw data you might
>>>>> >>>>>> have that
>>>>> >>>>>> would help us calculate page requests at this granular level.
>>>>> >>>>>> Please
>>>>> >>>>>> let us
>>>>> >>>>>> know if it would be possible to get such data and if so how.
>>>>> >>>>>> Thank you
>>>>> >>>>>> in
>>>>> >>>>>> advance for your help.
>>>>> >>>>>>
>>>>> >>>>>> Best,
>>>>> >>>>>>
>>>>> >>>>>> Hirav Gandhi
>>>>> >>>>>> _______________________________________________
>>>>> >>>>>> Analytics mailing list
>>>>> >>>>>> Analytics@lists.wikimedia.org
>>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> _______________________________________________
>>>>> >>>>>> Analytics mailing list
>>>>> >>>>>> Analytics@lists.wikimedia.org
>>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>> Oliver Keyes
>>>>> >>>>>> Research Analyst
>>>>> >>>>>> Wikimedia Foundation
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> ------------------------------
>>>>> >>>>>>
>>>>> >>>>>> _______________________________________________
>>>>> >>>>>> Analytics mailing list
>>>>> >>>>>> Analytics@lists.wikimedia.org
>>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> End of Analytics Digest, Vol 38, Issue 21
>>>>> >>>>>> *****************************************
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> _______________________________________________
>>>>> >>>>>> Analytics mailing list
>>>>> >>>>>> Analytics@lists.wikimedia.org
>>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> Oliver Keyes
>>>>> >>>>> Research Analyst
>>>>> >>>>> Wikimedia Foundation
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> Oliver Keyes
>>>>> >>>> Research Analyst
>>>>> >>>> Wikimedia Foundation
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> ------------------------------
>>>>> >>>>
>>>>> >>>> _______________________________________________
>>>>> >>>> Analytics mailing list
>>>>> >>>> Analytics@lists.wikimedia.org
>>>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> Analytics mailing list
>>>>> >>> Analytics@lists.wikimedia.org
>>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Oliver Keyes
>>>>> >> Research Analyst
>>>>> >> Wikimedia Foundation
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> ------------------------------
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> Analytics mailing list
>>>>> >> Analytics@lists.wikimedia.org
>>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >>
>>>>> >>
>>>>> >> End of Analytics Digest, Vol 38, Issue 24
>>>>> >> *****************************************
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > Analytics mailing list
>>>>> > Analytics@lists.wikimedia.org
>>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Oliver Keyes
>>>>> Research Analyst
>>>>> Wikimedia Foundation
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dario Taraborelli
>>>> Senior Research Scientist, Research and Data Lead
>>>> Wikimedia Foundation
>>>> http://wikimediafoundation.org
>>>> http://nitens.org/taraborelli
>>>
>>>
>>>
>>>
>>> --
>>> Dario Taraborelli
>>> Senior Research Scientist, Research and Data Lead
>>> Wikimedia Foundation
>>> http://wikimediafoundation.org
>>> http://nitens.org/taraborelli
>>
>>
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



--
Dario Taraborelli
Senior Research Scientist, Research and Data Lead
Wikimedia Foundation