Thanks Oliver!

We would like this data for as broad of a time period as you can muster. The more days, months and year represented in the dataset, the better.


Okay, so:

I took an hour from the pageviews logs,[0] and aggregated pageviews to
enwiki (mobile and desktop both) by timestamp, down to one-second
resolution levels. The lowest number of pageviews to enwiki per second
was 2,981

So, I don't personally have a problem with generating a release of:

1. Pageviews per second;
2. To enwiki;
3. Over $TIME_PERIOD;
4. grouping the mobile and desktop site

But Dario or someone should chip in before I touch anything ;p

6am yesterday. 6am because it should be low-traffic, right? At least
given our biases towards north america and europe

On 13 April 2015 at 11:54, Oliver Keyes <okeyes@wikimedia.org> wrote:
> Then that sounds much more viable. I'll run a quick test now to see
> how much clustering we'd see at, say, the one-second resolution level,
> and throw it out here so we can make more informed decisions about a
> data release on this.
>
> On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gandhi@gmail.com> wrote:
>> Hi Oliver,
>>
>> Re: Hirav: would you be looking for temporally /and/ contextually granular
>> pageviews, i.e. "a view to X page at Y time", or just temporally granular,
>> so "a view to a page on enwiki at X time"? If the latter you've got more of
>> a shot, I suspect.
>>
>> I only want the latter - I am not concerned with the context so much as just
>> “a view to a page on enwiki at X time.”
>>
>> Hirav
>>
>>
>> On Apr 13, 2015, at 5:00 AM, analytics-request@lists.wikimedia.org wrote:
>>
>> Send Analytics mailing list submissions to
>> analytics@lists.wikimedia.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> or, via email, send a message with subject or body 'help' to
>> analytics-request@lists.wikimedia.org
>>
>> You can reach the person managing the list at
>> analytics-owner@lists.wikimedia.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Analytics digest..."
>>
>>
>> Today's Topics:
>>
>>   1. Re: Page views on a more frequent than hourly basis (Pine W)
>>   2. Re: Page views on a more frequent than hourly basis (Oliver Keyes)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>> From: Pine W <wiki.pine@gmail.com>
>> To: "A mailing list for the Analytics Team at WMF and everybody who
>> has an interest in Wikipedia and analytics."
>> <analytics@lists.wikimedia.org>
>> Cc: Bharath Sitaraman <bharath1028@gmail.com>
>> Subject: Re: [Analytics] Page views on a more frequent than hourly
>> basis
>> Message-ID:
>> <CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>>
>> Hi,
>>
>> This issue of pageview data granularity has been discussed before, and the
>> answer has been that hourly is the smallest increment allowed to be
>> revealed publicly, for privacy reasons.
>>
>> I believe that the person you will want to discuss your request with is
>> Toby, who I have cc'd here.
>>
>> Pine
>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gandhi@gmail.com> wrote:
>>
>> Hi Wikimedia Analytics Team,
>>
>> My colleague Bharath and I are doing research on dynamic server allocation
>> algorithms and we were looking for a suitable datasets to test our
>> predictive algorithm on. We noticed that Wikimedia has an amazing data set
>> of hourly page views, but we were looking for something a bit more
>> granular, such as aggregated page requests to English Wikipedia on a minute
>> by minute basis or second by second basis if possible.
>>
>> We are more than happy to pour through any raw data you might have that
>> would help us calculate page requests at this granular level. Please let us
>> know if it would be possible to get such data and if so how. Thank you in
>> advance for your help.
>>
>> Best,
>>
>> Hirav Gandhi
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>> From: Oliver Keyes <okeyes@wikimedia.org>
>> To: "A mailing list for the Analytics Team at WMF and everybody who
>> has an interest in Wikipedia and analytics."
>> <analytics@lists.wikimedia.org>
>> Cc: Bharath Sitaraman <bharath1028@gmail.com>
>> Subject: Re: [Analytics] Page views on a more frequent than hourly
>> basis
>> Message-ID:
>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=hxPg@mail.gmail.com>
>> Content-Type: text/plain; charset=UTF-8
>>
>>
>> Preeetty sure that Toby is on the analytics list, Pine. He's the
>> director of analytics.
>>
>> Hirav: would you be looking for temporally /and/ contextually granular
>> pageviews, i.e. "a view to X page at Y time", or just temporally
>> granular, so "a view to a page on enwiki at X time"? If the latter
>> you've got more of a shot, I suspect.
>>
>> On 13 April 2015 at 03:47, Pine W <wiki.pine@gmail.com> wrote:
>>
>> Hi,
>>
>> This issue of pageview data granularity has been discussed before, and the
>> answer has been that hourly is the smallest increment allowed to be revealed
>> publicly, for privacy reasons.
>>
>> I believe that the person you will want to discuss your request with is
>> Toby, who I have cc'd here.
>>
>> Pine
>>
>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gandhi@gmail.com> wrote:
>>
>>
>> Hi Wikimedia Analytics Team,
>>
>> My colleague Bharath and I are doing research on dynamic server allocation
>> algorithms and we were looking for a suitable datasets to test our
>> predictive algorithm on. We noticed that Wikimedia has an amazing data set
>> of hourly page views, but we were looking for something a bit more granular,
>> such as aggregated page requests to English Wikipedia on a minute by minute
>> basis or second by second basis if possible.
>>
>> We are more than happy to pour through any raw data you might have that
>> would help us calculate page requests at this granular level. Please let us
>> know if it would be possible to get such data and if so how. Thank you in
>> advance for your help.
>>
>> Best,
>>
>> Hirav Gandhi
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>> End of Analytics Digest, Vol 38, Issue 21
>> *****************************************
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation



--
Oliver Keyes
Research Analyst
Wikimedia Foundation



------------------------------

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics