Thanks Oliver!
We would like this data for as broad of a time period as you can muster.
The more days, months and year represented in the dataset, the better.
Okay, so:
I took an hour from the pageviews logs,[0] and aggregated pageviews to
enwiki (mobile and desktop both) by timestamp, down to one-second
resolution levels. The lowest number of pageviews to enwiki per second
was 2,981
So, I don't personally have a problem with generating a release of:
1. Pageviews per second;
2. To enwiki;
3. Over $TIME_PERIOD;
4. grouping the mobile and desktop site
But Dario or someone should chip in before I touch anything ;p
6am yesterday. 6am because it should be low-traffic, right? At least
given our biases towards north america and europe
On 13 April 2015 at 11:54, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
Then that sounds much more viable. I'll run a
quick test now to see
how much clustering we'd see at, say, the one-second resolution level,
and throw it out here so we can make more informed decisions about a
data release on this.
On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gandhi(a)gmail.com> wrote:
> Hi Oliver,
>
> Re: Hirav: would you be looking for temporally /and/ contextually
granular
> pageviews, i.e. "a view to X page at Y
time", or just temporally
granular,
> so "a view to a page on enwiki at X
time"? If the latter you've got
more of
> a shot, I suspect.
>
> I only want the latter - I am not concerned with the context so much as
just
> “a view to a page on enwiki at X time.”
>
> Hirav
>
>
> On Apr 13, 2015, at 5:00 AM, analytics-request(a)lists.wikimedia.org
wrote:
>
> Send Analytics mailing list submissions to
> analytics(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
https://lists.wikimedia.org/mailman/listinfo/analytics
> or, via email, send a message with subject or body 'help' to
> analytics-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> analytics-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
>
>
> Today's Topics:
>
> 1. Re: Page views on a more frequent than hourly basis (Pine W)
> 2. Re: Page views on a more frequent than hourly basis (Oliver Keyes)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 13 Apr 2015 00:47:31 -0700
> From: Pine W <wiki.pine(a)gmail.com>
> To: "A mailing list for the Analytics Team at WMF and everybody who
> has an interest in Wikipedia and analytics."
> <analytics(a)lists.wikimedia.org>
> Cc: Bharath Sitaraman <bharath1028(a)gmail.com>
> Subject: Re: [Analytics] Page views on a more frequent than hourly
> basis
> Message-ID:
> <CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
>
> Hi,
>
> This issue of pageview data granularity has been discussed before, and
the
> answer has been that hourly is the smallest
increment allowed to be
> revealed publicly, for privacy reasons.
>
> I believe that the person you will want to discuss your request with is
> Toby, who I have cc'd here.
>
> Pine
> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gandhi(a)gmail.com>
wrote:
>
> Hi Wikimedia Analytics Team,
>
> My colleague Bharath and I are doing research on dynamic server
allocation
> algorithms and we were looking for a suitable
datasets to test our
> predictive algorithm on. We noticed that Wikimedia has an amazing data
set
> of hourly page views, but we were looking for
something a bit more
> granular, such as aggregated page requests to English Wikipedia on a
minute
> by minute basis or second by second basis if
possible.
>
> We are more than happy to pour through any raw data you might have that
> would help us calculate page requests at this granular level. Please
let us
> know if it would be possible to get such data
and if so how. Thank you
in
>> advance for your help.
>>
>> Best,
>>
>> Hirav Gandhi
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>