....
...years?
We have unsampled logs for, ah. 2 months.
On 13 April 2015 at 19:30, Hirav Gandhi <hirav.gandhi(a)gmail.com> wrote:
> Thanks Oliver!
>
> We would like this data for as broad of a time period as you can muster. The
> more days, months and year represented in the dataset, the better.
>
>>
>> Okay, so:
>>
>> I took an hour from the pageviews logs,[0] and aggregated pageviews to
>> enwiki (mobile and desktop both) by timestamp, down to one-second
>> resolution levels. The lowest number of pageviews to enwiki per second
>> was 2,981
>>
>> So, I don't personally have a problem with generating a release of:
>>
>> 1. Pageviews per second;
>> 2. To enwiki;
>> 3. Over $TIME_PERIOD;
>> 4. grouping the mobile and desktop site
>>
>> But Dario or someone should chip in before I touch anything ;p
>>
>> 6am yesterday. 6am because it should be low-traffic, right? At least
>> given our biases towards north america and europe
>>
>> On 13 April 2015 at 11:54, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
>> > Then that sounds much more viable. I'll run a quick test now to see
>> > how much clustering we'd see at, say, the one-second resolution level,
>> > and throw it out here so we can make more informed decisions about a
>> > data release on this.
>> >
>> > On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gandhi(a)gmail.com>
wrote:
>> >> Hi Oliver,
>> >>
>> >> Re: Hirav: would you be looking for temporally /and/ contextually
>> >> granular
>> >> pageviews, i.e. "a view to X page at Y time", or just
temporally
>> >> granular,
>> >> so "a view to a page on enwiki at X time"? If the latter
you've got
>> >> more of
>> >> a shot, I suspect.
>> >>
>> >> I only want the latter - I am not concerned with the context so much as
>> >> just
>> >> “a view to a page on enwiki at X time.”
>> >>
>> >> Hirav
>> >>
>> >>
>> >> On Apr 13, 2015, at 5:00 AM, analytics-request(a)lists.wikimedia.org
>> >> wrote:
>> >>
>> >> Send Analytics mailing list submissions to
>> >> analytics(a)lists.wikimedia.org
>> >>
>> >> To subscribe or unsubscribe via the World Wide Web, visit
>> >>
https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> or, via email, send a message with subject or body 'help' to
>> >> analytics-request(a)lists.wikimedia.org
>> >>
>> >> You can reach the person managing the list at
>> >> analytics-owner(a)lists.wikimedia.org
>> >>
>> >> When replying, please edit your Subject line so it is more specific
>> >> than "Re: Contents of Analytics digest..."
>> >>
>> >>
>> >> Today's Topics:
>> >>
>> >> 1. Re: Page views on a more frequent than hourly basis (Pine W)
>> >> 2. Re: Page views on a more frequent than hourly basis (Oliver Keyes)
>> >>
>> >>
>> >> ----------------------------------------------------------------------
>> >>
>> >> Message: 1
>> >> Date: Mon, 13 Apr 2015 00:47:31 -0700
>> >> From: Pine W <wiki.pine(a)gmail.com>
>> >> To: "A mailing list for the Analytics Team at WMF and everybody
who
>> >> has an interest in Wikipedia and analytics."
>> >> <analytics(a)lists.wikimedia.org>
>> >> Cc: Bharath Sitaraman <bharath1028(a)gmail.com>
>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
>> >> basis
>> >> Message-ID:
>> >>
<CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ(a)mail.gmail.com>
>> >> Content-Type: text/plain; charset="utf-8"
>> >>
>> >>
>> >> Hi,
>> >>
>> >> This issue of pageview data granularity has been discussed before, and
>> >> the
>> >> answer has been that hourly is the smallest increment allowed to be
>> >> revealed publicly, for privacy reasons.
>> >>
>> >> I believe that the person you will want to discuss your request with is
>> >> Toby, who I have cc'd here.
>> >>
>> >> Pine
>> >> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
<hirav.gandhi(a)gmail.com>
>> >> wrote:
>> >>
>> >> Hi Wikimedia Analytics Team,
>> >>
>> >> My colleague Bharath and I are doing research on dynamic server
>> >> allocation
>> >> algorithms and we were looking for a suitable datasets to test our
>> >> predictive algorithm on. We noticed that Wikimedia has an amazing data
>> >> set
>> >> of hourly page views, but we were looking for something a bit more
>> >> granular, such as aggregated page requests to English Wikipedia on a
>> >> minute
>> >> by minute basis or second by second basis if possible.
>> >>
>> >> We are more than happy to pour through any raw data you might have that
>> >> would help us calculate page requests at this granular level. Please
>> >> let us
>> >> know if it would be possible to get such data and if so how. Thank you
>> >> in
>> >> advance for your help.
>> >>
>> >> Best,
>> >>
>> >> Hirav Gandhi
>> >> _______________________________________________
>> >> Analytics mailing list
>> >> Analytics(a)lists.wikimedia.org
>> >>
https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>
>> >>