Note: currently stalled on "the cluster is backlogged and the jam needs to clear before I can run things". My apologies!

On 15 April 2015 at 14:12, Michelle Paulson <mpaulson@wikimedia.org> wrote:
Looks good to me! 

-M

==
Michelle Paulson
Senior Legal Counsel
Wikimedia Foundation
149 New Montgomery Street, 6th Floor
San Francisco, CA 94105
mpaulson@wikimedia.org
415.839.6885 ext. 6608 (Office)
415.882.0495 (Fax)

NOTICE: This message may be confidential or legally privileged. If you have received it by accident, please delete it and let us know about the mistake. As an attorney for the Wikimedia Foundation and for legal/ethical reasons, I cannot give legal advice to, or serve as a lawyer for, community members, volunteers, or staff members in their personal capacity. For more on what this means, please see our legal disclaimer.

On Wed, Apr 15, 2015 at 10:41 AM, Bharath Sitaraman <bharath@cs.stanford.edu> wrote:
And thanks for doing all of this for us! We do greatly appreciate it!

Cheers,
Bharath

On Wed, Apr 15, 2015 at 10:40 AM, Bharath Sitaraman <bharath@cs.stanford.edu> wrote:
Interested in an Erlang book? :P Pretty sure I have one of those laying around here...

Cheers,
Bharath

On Wed, Apr 15, 2015 at 10:38 AM, Oliver Keyes <okeyes@wikimedia.org> wrote:
I accept payment in books, pull requests and speaking invitations ;p.

(Updated check-the-minimum query running now!)

On 15 April 2015 at 13:35, Hirav Gandhi <hirav.gandhi@gmail.com> wrote:
> Sorry Oliver. Let me know where I can send the beer/coffee money to
> compensate you for the hard work :)
>
>
>
> On Wed, Apr 15, 2015 at 10:34 AM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>
>> /This/ you say 2.5 seconds after I've launched the query ;p. Yes, it
>> is possible, but I'll have to recalculate the likely minimum and check
>> that it's still okay.
>>
>> On 15 April 2015 at 13:32, Hirav Gandhi <hirav.gandhi@gmail.com> wrote:
>> > Hi Dario,
>> >
>> > One last question - would it be possible to break it out into mobile vs
>> > desktop? We are also concerned there might be seasonality effects in
>> > there
>> > as well. Please let us know.
>> >
>> > Best,
>> >
>> > Hirav
>> >
>> >
>> >
>> > On Wed, Apr 15, 2015 at 10:27 AM, Dario Taraborelli
>> > <dtaraborelli@wikimedia.org> wrote:
>> >>
>> >> thanks, both. Let's go ahead with English only and no spiders filtered
>> >> or
>> >> mobile/desktop breakdown, per Oliver.
>> >>
>> >> Michelle – given the aggregation level I am fine moving forward with
>> >> this
>> >> release, but let me know off-thread if you have any questions.
>> >>
>> >> Dario
>> >>
>> >> On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes <okeyes@wikimedia.org>
>> >> wrote:
>> >>>
>> >>> Dario,
>> >>>
>> >>> No spider filtering, and no split between mobile and desktop; mobile
>> >>> and desktop are grouped.
>> >>>
>> >>> On 15 April 2015 at 12:46, Hirav Gandhi <hirav.gandhi@gmail.com>
>> >>> wrote:
>> >>> > e.g. German*
>> >>> >
>> >>> > I need more coffee.
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi
>> >>> > <hirav.gandhi@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Dario - we just want a representative samples of traffic for a
>> >>> >> popular
>> >>> >> site like Wikipedia. We thought limiting to the English Wikipedia
>> >>> >> would be
>> >>> >> easier.
>> >>> >>
>> >>> >> If we get aggregated data across all language Wikipedia sites, we
>> >>> >> would
>> >>> >> need someway to tease out which language is being queried when.
>> >>> >> Some
>> >>> >> languages (for e.g. German) we would hypothesize would have more
>> >>> >> daily
>> >>> >> seasonality than languages like English.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli
>> >>> >> <dtaraborelli@wikimedia.org> wrote:
>> >>> >>>
>> >>> >>> Hirav, Bharath – I also want to hear from you if there's a
>> >>> >>> specific
>> >>> >>> reason to ask for English Wikipedia only or if a dataset
>> >>> >>> encompassing
>> >>> >>> aggregate pageviews across all Wikimedia properties would do the
>> >>> >>> job.
>> >>> >>>
>> >>> >>> Dario
>> >>> >>>
>> >>> >>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli
>> >>> >>> <dtaraborelli@wikimedia.org> wrote:
>> >>> >>>>
>> >>> >>>> Oliver -- thanks for running a preliminary check, I'm fine
>> >>> >>>> releasing
>> >>> >>>> this data in aggregate under CC0, I believe it would be valuable
>> >>> >>>> for
>> >>> >>>> this
>> >>> >>>> and other research projects (copying Michelle from Legal).
>> >>> >>>>
>> >>> >>>> Before we do so, though, I want to confirm the specs: aggregate
>> >>> >>>> pageviews per second to English Wikipedia, excluding bot traffic,
>> >>> >>>> broken
>> >>> >>>> down by access method (mobile web vs desktop site, not apps) for
>> >>> >>>> a
>> >>> >>>> 60-day
>> >>> >>>> period. Oliver – are these the filters you used to identify the
>> >>> >>>> data
>> >>> >>>> point
>> >>> >>>> with the smallest number of observations?
>> >>> >>>>
>> >>> >>>> Obviously, we will need to take into account this release when we
>> >>> >>>> start
>> >>> >>>> working on projects such as
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits
>> >>> >>>> and
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
>> >>> >>>>
>> >>> >>>> Dario
>> >>> >>>>
>> >>> >>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes
>> >>> >>>> <okeyes@wikimedia.org>
>> >>> >>>> wrote:
>> >>> >>>>>
>> >>> >>>>> Bumping for Dario, per Pine's excellent example :)
>> >>> >>>>>
>> >>> >>>>> On 13 April 2015 at 22:18, Hirav Gandhi <hirav.gandhi@gmail.com>
>> >>> >>>>> wrote:
>> >>> >>>>> > Oliver: Two months is fine. Thank you so much for your help!
>> >>> >>>>> >
>> >>> >>>>> >> On Apr 13, 2015, at 4:40 PM,
>> >>> >>>>> >> analytics-request@lists.wikimedia.org
>> >>> >>>>> >> wrote:
>> >>> >>>>> >>
>> >>> >>>>> >> Send Analytics mailing list submissions to
>> >>> >>>>> >> analytics@lists.wikimedia.org
>> >>> >>>>> >>
>> >>> >>>>> >> To subscribe or unsubscribe via the World Wide Web, visit
>> >>> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >> or, via email, send a message with subject or body 'help' to
>> >>> >>>>> >> analytics-request@lists.wikimedia.org
>> >>> >>>>> >>
>> >>> >>>>> >> You can reach the person managing the list at
>> >>> >>>>> >> analytics-owner@lists.wikimedia.org
>> >>> >>>>> >>
>> >>> >>>>> >> When replying, please edit your Subject line so it is more
>> >>> >>>>> >> specific
>> >>> >>>>> >> than "Re: Contents of Analytics digest..."
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> Today's Topics:
>> >>> >>>>> >>
>> >>> >>>>> >> 1. Re: Page views on a more frequent than hourly basis (Pine
>> >>> >>>>> >> W)
>> >>> >>>>> >> 2. Re: Page views on a more frequent than hourly basis (Hirav
>> >>> >>>>> >> Gandhi)
>> >>> >>>>> >> 3. Re: Page views on a more frequent than hourly basis
>> >>> >>>>> >> (Oliver
>> >>> >>>>> >> Keyes)
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> ----------------------------------------------------------------------
>> >>> >>>>> >>
>> >>> >>>>> >> Message: 1
>> >>> >>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700
>> >>> >>>>> >> From: Pine W <wiki.pine@gmail.com>
>> >>> >>>>> >> To: "A mailing list for the Analytics Team at WMF and
>> >>> >>>>> >> everybody
>> >>> >>>>> >> who
>> >>> >>>>> >> has an interest in Wikipedia and analytics."
>> >>> >>>>> >> <analytics@lists.wikimedia.org>
>> >>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than
>> >>> >>>>> >> hourly
>> >>> >>>>> >> basis
>> >>> >>>>> >> Message-ID:
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> <CAF=dyJjZMdfTHZ+0+LwnHb9m8xUOd4WetGCFUXYB9Qyf7CyC5Q@mail.gmail.com>
>> >>> >>>>> >> Content-Type: text/plain; charset="utf-8"
>> >>> >>>>> >>
>> >>> >>>>> >> Hi Oliver, re ccing people who are on list, this is the
>> >>> >>>>> >> protocol
>> >>> >>>>> >> we
>> >>> >>>>> >> followed in IEGCom to ping people who are subscribed and
>> >>> >>>>> >> mentioned
>> >>> >>>>> >> in
>> >>> >>>>> >> certain emails but, like many of us, may automatically move
>> >>> >>>>> >> emails
>> >>> >>>>> >> from
>> >>> >>>>> >> lists directly to folders where they may be unread for days.
>> >>> >>>>> >> So
>> >>> >>>>> >> there is a
>> >>> >>>>> >> reason to do this.
>> >>> >>>>> >>
>> >>> >>>>> >> Thanks,
>> >>> >>>>> >>
>> >>> >>>>> >> Pine
>> >>> >>>>> >> -------------- next part --------------
>> >>> >>>>> >> An HTML attachment was scrubbed...
>> >>> >>>>> >> URL:
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html>
>> >>> >>>>> >>
>> >>> >>>>> >> ------------------------------
>> >>> >>>>> >>
>> >>> >>>>> >> Message: 2
>> >>> >>>>> >> Date: Mon, 13 Apr 2015 16:30:43 -0700
>> >>> >>>>> >> From: Hirav Gandhi <hirav.gandhi@gmail.com>
>> >>> >>>>> >> To: analytics@lists.wikimedia.org
>> >>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than
>> >>> >>>>> >> hourly
>> >>> >>>>> >> basis
>> >>> >>>>> >> Message-ID:
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> <CANzC_EOvi4MP7G_SsxvW=UOjPt2vXbNfMHcipqN1pumACE-eEw@mail.gmail.com>
>> >>> >>>>> >> Content-Type: text/plain; charset="utf-8"
>> >>> >>>>> >>
>> >>> >>>>> >> Thanks Oliver!
>> >>> >>>>> >>
>> >>> >>>>> >> We would like this data for as broad of a time period as you
>> >>> >>>>> >> can
>> >>> >>>>> >> muster.
>> >>> >>>>> >> The more days, months and year represented in the dataset,
>> >>> >>>>> >> the
>> >>> >>>>> >> better.
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >>> Okay, so:
>> >>> >>>>> >>>
>> >>> >>>>> >>> I took an hour from the pageviews logs,[0] and aggregated
>> >>> >>>>> >>> pageviews
>> >>> >>>>> >>> to
>> >>> >>>>> >>> enwiki (mobile and desktop both) by timestamp, down to
>> >>> >>>>> >>> one-second
>> >>> >>>>> >>> resolution levels. The lowest number of pageviews to enwiki
>> >>> >>>>> >>> per
>> >>> >>>>> >>> second
>> >>> >>>>> >>> was 2,981
>> >>> >>>>> >>>
>> >>> >>>>> >>> So, I don't personally have a problem with generating a
>> >>> >>>>> >>> release
>> >>> >>>>> >>> of:
>> >>> >>>>> >>>
>> >>> >>>>> >>> 1. Pageviews per second;
>> >>> >>>>> >>> 2. To enwiki;
>> >>> >>>>> >>> 3. Over $TIME_PERIOD;
>> >>> >>>>> >>> 4. grouping the mobile and desktop site
>> >>> >>>>> >>>
>> >>> >>>>> >>> But Dario or someone should chip in before I touch anything
>> >>> >>>>> >>> ;p
>> >>> >>>>> >>>
>> >>> >>>>> >>> 6am yesterday. 6am because it should be low-traffic, right?
>> >>> >>>>> >>> At
>> >>> >>>>> >>> least
>> >>> >>>>> >>> given our biases towards north america and europe
>> >>> >>>>> >>>
>> >>> >>>>> >>> On 13 April 2015 at 11:54, Oliver Keyes
>> >>> >>>>> >>> <okeyes@wikimedia.org>
>> >>> >>>>> >>> wrote:
>> >>> >>>>> >>>> Then that sounds much more viable. I'll run a quick test
>> >>> >>>>> >>>> now
>> >>> >>>>> >>>> to
>> >>> >>>>> >>>> see
>> >>> >>>>> >>>> how much clustering we'd see at, say, the one-second
>> >>> >>>>> >>>> resolution
>> >>> >>>>> >>>> level,
>> >>> >>>>> >>>> and throw it out here so we can make more informed
>> >>> >>>>> >>>> decisions
>> >>> >>>>> >>>> about
>> >>> >>>>> >>>> a
>> >>> >>>>> >>>> data release on this.
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi
>> >>> >>>>> >>>> <hirav.gandhi@gmail.com>
>> >>> >>>>> >>>> wrote:
>> >>> >>>>> >>>>> Hi Oliver,
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Re: Hirav: would you be looking for temporally /and/
>> >>> >>>>> >>>>> contextually
>> >>> >>>>> >>> granular
>> >>> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >>> >>>>> >>>>> temporally
>> >>> >>>>> >>> granular,
>> >>> >>>>> >>>>> so "a view to a page on enwiki at X time"? If the latter
>> >>> >>>>> >>>>> you've
>> >>> >>>>> >>>>> got
>> >>> >>>>> >>> more of
>> >>> >>>>> >>>>> a shot, I suspect.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> I only want the latter - I am not concerned with the
>> >>> >>>>> >>>>> context
>> >>> >>>>> >>>>> so
>> >>> >>>>> >>>>> much as
>> >>> >>>>> >>> just
>> >>> >>>>> >>>>> “a view to a page on enwiki at X time.”
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Hirav
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> On Apr 13, 2015, at 5:00 AM,
>> >>> >>>>> >>>>> analytics-request@lists.wikimedia.org
>> >>> >>>>> >>> wrote:
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Send Analytics mailing list submissions to
>> >>> >>>>> >>>>> analytics@lists.wikimedia.org
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> To subscribe or unsubscribe via the World Wide Web, visit
>> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>> or, via email, send a message with subject or body 'help'
>> >>> >>>>> >>>>> to
>> >>> >>>>> >>>>> analytics-request@lists.wikimedia.org
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> You can reach the person managing the list at
>> >>> >>>>> >>>>> analytics-owner@lists.wikimedia.org
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> When replying, please edit your Subject line so it is more
>> >>> >>>>> >>>>> specific
>> >>> >>>>> >>>>> than "Re: Contents of Analytics digest..."
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Today's Topics:
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> 1. Re: Page views on a more frequent than hourly basis
>> >>> >>>>> >>>>> (Pine
>> >>> >>>>> >>>>> W)
>> >>> >>>>> >>>>> 2. Re: Page views on a more frequent than hourly basis
>> >>> >>>>> >>>>> (Oliver
>> >>> >>>>> >>>>> Keyes)
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> ----------------------------------------------------------------------
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Message: 1
>> >>> >>>>> >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>> >>> >>>>> >>>>> From: Pine W <wiki.pine@gmail.com>
>> >>> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and
>> >>> >>>>> >>>>> everybody
>> >>> >>>>> >>>>> who
>> >>> >>>>> >>>>> has an interest in Wikipedia and analytics."
>> >>> >>>>> >>>>> <analytics@lists.wikimedia.org>
>> >>> >>>>> >>>>> Cc: Bharath Sitaraman <bharath1028@gmail.com>
>> >>> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent
>> >>> >>>>> >>>>> than
>> >>> >>>>> >>>>> hourly
>> >>> >>>>> >>>>> basis
>> >>> >>>>> >>>>> Message-ID:
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> <CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ@mail.gmail.com>
>> >>> >>>>> >>>>> Content-Type: text/plain; charset="utf-8"
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Hi,
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> This issue of pageview data granularity has been discussed
>> >>> >>>>> >>>>> before, and
>> >>> >>>>> >>> the
>> >>> >>>>> >>>>> answer has been that hourly is the smallest increment
>> >>> >>>>> >>>>> allowed
>> >>> >>>>> >>>>> to
>> >>> >>>>> >>>>> be
>> >>> >>>>> >>>>> revealed publicly, for privacy reasons.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> I believe that the person you will want to discuss your
>> >>> >>>>> >>>>> request
>> >>> >>>>> >>>>> with is
>> >>> >>>>> >>>>> Toby, who I have cc'd here.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Pine
>> >>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >>> >>>>> >>>>> <hirav.gandhi@gmail.com>
>> >>> >>>>> >>> wrote:
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Hi Wikimedia Analytics Team,
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic
>> >>> >>>>> >>>>> server
>> >>> >>>>> >>> allocation
>> >>> >>>>> >>>>> algorithms and we were looking for a suitable datasets to
>> >>> >>>>> >>>>> test
>> >>> >>>>> >>>>> our
>> >>> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an
>> >>> >>>>> >>>>> amazing
>> >>> >>>>> >>>>> data
>> >>> >>>>> >>> set
>> >>> >>>>> >>>>> of hourly page views, but we were looking for something a
>> >>> >>>>> >>>>> bit
>> >>> >>>>> >>>>> more
>> >>> >>>>> >>>>> granular, such as aggregated page requests to English
>> >>> >>>>> >>>>> Wikipedia
>> >>> >>>>> >>>>> on a
>> >>> >>>>> >>> minute
>> >>> >>>>> >>>>> by minute basis or second by second basis if possible.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> We are more than happy to pour through any raw data you
>> >>> >>>>> >>>>> might
>> >>> >>>>> >>>>> have that
>> >>> >>>>> >>>>> would help us calculate page requests at this granular
>> >>> >>>>> >>>>> level.
>> >>> >>>>> >>>>> Please
>> >>> >>>>> >>> let us
>> >>> >>>>> >>>>> know if it would be possible to get such data and if so
>> >>> >>>>> >>>>> how.
>> >>> >>>>> >>>>> Thank you
>> >>> >>>>> >>> in
>> >>> >>>>> >>>>> advance for your help.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Best,
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Hirav Gandhi
>> >>> >>>>> >>>>> _______________________________________________
>> >>> >>>>> >>>>> Analytics mailing list
>> >>> >>>>> >>>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> -------------- next part --------------
>> >>> >>>>> >>>>> An HTML attachment was scrubbed...
>> >>> >>>>> >>>>> URL:
>> >>> >>>>> >>>>> <
>> >>> >>>>> >>>
>> >>> >>>>> >>>
>> >>> >>>>> >>>
>> >>> >>>>> >>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html
>> >>> >>>>> >>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> ------------------------------
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Message: 2
>> >>> >>>>> >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>> >>> >>>>> >>>>> From: Oliver Keyes <okeyes@wikimedia.org>
>> >>> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and
>> >>> >>>>> >>>>> everybody
>> >>> >>>>> >>>>> who
>> >>> >>>>> >>>>> has an interest in Wikipedia and analytics."
>> >>> >>>>> >>>>> <analytics@lists.wikimedia.org>
>> >>> >>>>> >>>>> Cc: Bharath Sitaraman <bharath1028@gmail.com>
>> >>> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent
>> >>> >>>>> >>>>> than
>> >>> >>>>> >>>>> hourly
>> >>> >>>>> >>>>> basis
>> >>> >>>>> >>>>> Message-ID:
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=hxPg@mail.gmail.com>
>> >>> >>>>> >>>>> Content-Type: text/plain; charset=UTF-8
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine.
>> >>> >>>>> >>>>> He's
>> >>> >>>>> >>>>> the
>> >>> >>>>> >>>>> director of analytics.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Hirav: would you be looking for temporally /and/
>> >>> >>>>> >>>>> contextually
>> >>> >>>>> >>>>> granular
>> >>> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >>> >>>>> >>>>> temporally
>> >>> >>>>> >>>>> granular, so "a view to a page on enwiki at X time"? If
>> >>> >>>>> >>>>> the
>> >>> >>>>> >>>>> latter
>> >>> >>>>> >>>>> you've got more of a shot, I suspect.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> On 13 April 2015 at 03:47, Pine W <wiki.pine@gmail.com>
>> >>> >>>>> >>>>> wrote:
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Hi,
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> This issue of pageview data granularity has been discussed
>> >>> >>>>> >>>>> before, and
>> >>> >>>>> >>> the
>> >>> >>>>> >>>>> answer has been that hourly is the smallest increment
>> >>> >>>>> >>>>> allowed
>> >>> >>>>> >>>>> to
>> >>> >>>>> >>>>> be
>> >>> >>>>> >>> revealed
>> >>> >>>>> >>>>> publicly, for privacy reasons.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> I believe that the person you will want to discuss your
>> >>> >>>>> >>>>> request
>> >>> >>>>> >>>>> with is
>> >>> >>>>> >>>>> Toby, who I have cc'd here.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Pine
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >>> >>>>> >>>>> <hirav.gandhi@gmail.com>
>> >>> >>>>> >>> wrote:
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Hi Wikimedia Analytics Team,
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic
>> >>> >>>>> >>>>> server
>> >>> >>>>> >>> allocation
>> >>> >>>>> >>>>> algorithms and we were looking for a suitable datasets to
>> >>> >>>>> >>>>> test
>> >>> >>>>> >>>>> our
>> >>> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an
>> >>> >>>>> >>>>> amazing
>> >>> >>>>> >>>>> data
>> >>> >>>>> >>> set
>> >>> >>>>> >>>>> of hourly page views, but we were looking for something a
>> >>> >>>>> >>>>> bit
>> >>> >>>>> >>>>> more
>> >>> >>>>> >>> granular,
>> >>> >>>>> >>>>> such as aggregated page requests to English Wikipedia on a
>> >>> >>>>> >>>>> minute
>> >>> >>>>> >>>>> by
>> >>> >>>>> >>> minute
>> >>> >>>>> >>>>> basis or second by second basis if possible.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> We are more than happy to pour through any raw data you
>> >>> >>>>> >>>>> might
>> >>> >>>>> >>>>> have that
>> >>> >>>>> >>>>> would help us calculate page requests at this granular
>> >>> >>>>> >>>>> level.
>> >>> >>>>> >>>>> Please
>> >>> >>>>> >>> let us
>> >>> >>>>> >>>>> know if it would be possible to get such data and if so
>> >>> >>>>> >>>>> how.
>> >>> >>>>> >>>>> Thank you
>> >>> >>>>> >>> in
>> >>> >>>>> >>>>> advance for your help.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Best,
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> Hirav Gandhi
>> >>> >>>>> >>>>> _______________________________________________
>> >>> >>>>> >>>>> Analytics mailing list
>> >>> >>>>> >>>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> _______________________________________________
>> >>> >>>>> >>>>> Analytics mailing list
>> >>> >>>>> >>>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> --
>> >>> >>>>> >>>>> Oliver Keyes
>> >>> >>>>> >>>>> Research Analyst
>> >>> >>>>> >>>>> Wikimedia Foundation
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> ------------------------------
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> _______________________________________________
>> >>> >>>>> >>>>> Analytics mailing list
>> >>> >>>>> >>>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21
>> >>> >>>>> >>>>> *****************************************
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> _______________________________________________
>> >>> >>>>> >>>>> Analytics mailing list
>> >>> >>>>> >>>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>
>> >>> >>>>> >>>>
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> --
>> >>> >>>>> >>>> Oliver Keyes
>> >>> >>>>> >>>> Research Analyst
>> >>> >>>>> >>>> Wikimedia Foundation
>> >>> >>>>> >>>
>> >>> >>>>> >>>
>> >>> >>>>> >>>
>> >>> >>>>> >>> --
>> >>> >>>>> >>> Oliver Keyes
>> >>> >>>>> >>> Research Analyst
>> >>> >>>>> >>> Wikimedia Foundation
>> >>> >>>>> >>>
>> >>> >>>>> >>>
>> >>> >>>>> >>>
>> >>> >>>>> >>> ------------------------------
>> >>> >>>>> >>>
>> >>> >>>>> >>> _______________________________________________
>> >>> >>>>> >>> Analytics mailing list
>> >>> >>>>> >>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>
>> >>> >>>>> >> -------------- next part --------------
>> >>> >>>>> >> An HTML attachment was scrubbed...
>> >>> >>>>> >> URL:
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html>
>> >>> >>>>> >>
>> >>> >>>>> >> ------------------------------
>> >>> >>>>> >>
>> >>> >>>>> >> Message: 3
>> >>> >>>>> >> Date: Mon, 13 Apr 2015 19:40:04 -0400
>> >>> >>>>> >> From: Oliver Keyes <okeyes@wikimedia.org>
>> >>> >>>>> >> To: "A mailing list for the Analytics Team at WMF and
>> >>> >>>>> >> everybody
>> >>> >>>>> >> who
>> >>> >>>>> >> has an interest in Wikipedia and analytics."
>> >>> >>>>> >> <analytics@lists.wikimedia.org>
>> >>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than
>> >>> >>>>> >> hourly
>> >>> >>>>> >> basis
>> >>> >>>>> >> Message-ID:
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> <CAAUQgdD6Z5USsu11VW49fDMBSrhYEjxKU9yOPySEriB79J-5Cg@mail.gmail.com>
>> >>> >>>>> >> Content-Type: text/plain; charset=UTF-8
>> >>> >>>>> >>
>> >>> >>>>> >> ....
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> ...years?
>> >>> >>>>> >>
>> >>> >>>>> >> We have unsampled logs for, ah. 2 months.
>> >>> >>>>> >>
>> >>> >>>>> >> On 13 April 2015 at 19:30, Hirav Gandhi
>> >>> >>>>> >> <hirav.gandhi@gmail.com>
>> >>> >>>>> >> wrote:
>> >>> >>>>> >>> Thanks Oliver!
>> >>> >>>>> >>>
>> >>> >>>>> >>> We would like this data for as broad of a time period as you
>> >>> >>>>> >>> can
>> >>> >>>>> >>> muster. The
>> >>> >>>>> >>> more days, months and year represented in the dataset, the
>> >>> >>>>> >>> better.
>> >>> >>>>> >>>
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> Okay, so:
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> I took an hour from the pageviews logs,[0] and aggregated
>> >>> >>>>> >>>> pageviews to
>> >>> >>>>> >>>> enwiki (mobile and desktop both) by timestamp, down to
>> >>> >>>>> >>>> one-second
>> >>> >>>>> >>>> resolution levels. The lowest number of pageviews to enwiki
>> >>> >>>>> >>>> per
>> >>> >>>>> >>>> second
>> >>> >>>>> >>>> was 2,981
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> So, I don't personally have a problem with generating a
>> >>> >>>>> >>>> release
>> >>> >>>>> >>>> of:
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> 1. Pageviews per second;
>> >>> >>>>> >>>> 2. To enwiki;
>> >>> >>>>> >>>> 3. Over $TIME_PERIOD;
>> >>> >>>>> >>>> 4. grouping the mobile and desktop site
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> But Dario or someone should chip in before I touch anything
>> >>> >>>>> >>>> ;p
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> 6am yesterday. 6am because it should be low-traffic, right?
>> >>> >>>>> >>>> At
>> >>> >>>>> >>>> least
>> >>> >>>>> >>>> given our biases towards north america and europe
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> On 13 April 2015 at 11:54, Oliver Keyes
>> >>> >>>>> >>>> <okeyes@wikimedia.org>
>> >>> >>>>> >>>> wrote:
>> >>> >>>>> >>>>> Then that sounds much more viable. I'll run a quick test
>> >>> >>>>> >>>>> now
>> >>> >>>>> >>>>> to
>> >>> >>>>> >>>>> see
>> >>> >>>>> >>>>> how much clustering we'd see at, say, the one-second
>> >>> >>>>> >>>>> resolution
>> >>> >>>>> >>>>> level,
>> >>> >>>>> >>>>> and throw it out here so we can make more informed
>> >>> >>>>> >>>>> decisions
>> >>> >>>>> >>>>> about a
>> >>> >>>>> >>>>> data release on this.
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> On 13 April 2015 at 08:08, Hirav Gandhi
>> >>> >>>>> >>>>> <hirav.gandhi@gmail.com>
>> >>> >>>>> >>>>> wrote:
>> >>> >>>>> >>>>>> Hi Oliver,
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Re: Hirav: would you be looking for temporally /and/
>> >>> >>>>> >>>>>> contextually
>> >>> >>>>> >>>>>> granular
>> >>> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >>> >>>>> >>>>>> temporally
>> >>> >>>>> >>>>>> granular,
>> >>> >>>>> >>>>>> so "a view to a page on enwiki at X time"? If the latter
>> >>> >>>>> >>>>>> you've
>> >>> >>>>> >>>>>> got
>> >>> >>>>> >>>>>> more of
>> >>> >>>>> >>>>>> a shot, I suspect.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> I only want the latter - I am not concerned with the
>> >>> >>>>> >>>>>> context
>> >>> >>>>> >>>>>> so
>> >>> >>>>> >>>>>> much as
>> >>> >>>>> >>>>>> just
>> >>> >>>>> >>>>>> “a view to a page on enwiki at X time.”
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Hirav
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> On Apr 13, 2015, at 5:00 AM,
>> >>> >>>>> >>>>>> analytics-request@lists.wikimedia.org
>> >>> >>>>> >>>>>> wrote:
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Send Analytics mailing list submissions to
>> >>> >>>>> >>>>>> analytics@lists.wikimedia.org
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>> >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>> or, via email, send a message with subject or body 'help'
>> >>> >>>>> >>>>>> to
>> >>> >>>>> >>>>>> analytics-request@lists.wikimedia.org
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> You can reach the person managing the list at
>> >>> >>>>> >>>>>> analytics-owner@lists.wikimedia.org
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> When replying, please edit your Subject line so it is
>> >>> >>>>> >>>>>> more
>> >>> >>>>> >>>>>> specific
>> >>> >>>>> >>>>>> than "Re: Contents of Analytics digest..."
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Today's Topics:
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> 1. Re: Page views on a more frequent than hourly basis
>> >>> >>>>> >>>>>> (Pine W)
>> >>> >>>>> >>>>>> 2. Re: Page views on a more frequent than hourly basis
>> >>> >>>>> >>>>>> (Oliver
>> >>> >>>>> >>>>>> Keyes)
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> ----------------------------------------------------------------------
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Message: 1
>> >>> >>>>> >>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>> >>> >>>>> >>>>>> From: Pine W <wiki.pine@gmail.com>
>> >>> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and
>> >>> >>>>> >>>>>> everybody
>> >>> >>>>> >>>>>> who
>> >>> >>>>> >>>>>> has an interest in Wikipedia and analytics."
>> >>> >>>>> >>>>>> <analytics@lists.wikimedia.org>
>> >>> >>>>> >>>>>> Cc: Bharath Sitaraman <bharath1028@gmail.com>
>> >>> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent
>> >>> >>>>> >>>>>> than
>> >>> >>>>> >>>>>> hourly
>> >>> >>>>> >>>>>> basis
>> >>> >>>>> >>>>>> Message-ID:
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> <CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ@mail.gmail.com>
>> >>> >>>>> >>>>>> Content-Type: text/plain; charset="utf-8"
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Hi,
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> This issue of pageview data granularity has been
>> >>> >>>>> >>>>>> discussed
>> >>> >>>>> >>>>>> before, and
>> >>> >>>>> >>>>>> the
>> >>> >>>>> >>>>>> answer has been that hourly is the smallest increment
>> >>> >>>>> >>>>>> allowed to
>> >>> >>>>> >>>>>> be
>> >>> >>>>> >>>>>> revealed publicly, for privacy reasons.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> I believe that the person you will want to discuss your
>> >>> >>>>> >>>>>> request
>> >>> >>>>> >>>>>> with is
>> >>> >>>>> >>>>>> Toby, who I have cc'd here.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Pine
>> >>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >>> >>>>> >>>>>> <hirav.gandhi@gmail.com>
>> >>> >>>>> >>>>>> wrote:
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Hi Wikimedia Analytics Team,
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic
>> >>> >>>>> >>>>>> server
>> >>> >>>>> >>>>>> allocation
>> >>> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to
>> >>> >>>>> >>>>>> test
>> >>> >>>>> >>>>>> our
>> >>> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an
>> >>> >>>>> >>>>>> amazing data
>> >>> >>>>> >>>>>> set
>> >>> >>>>> >>>>>> of hourly page views, but we were looking for something a
>> >>> >>>>> >>>>>> bit
>> >>> >>>>> >>>>>> more
>> >>> >>>>> >>>>>> granular, such as aggregated page requests to English
>> >>> >>>>> >>>>>> Wikipedia
>> >>> >>>>> >>>>>> on a
>> >>> >>>>> >>>>>> minute
>> >>> >>>>> >>>>>> by minute basis or second by second basis if possible.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> We are more than happy to pour through any raw data you
>> >>> >>>>> >>>>>> might
>> >>> >>>>> >>>>>> have that
>> >>> >>>>> >>>>>> would help us calculate page requests at this granular
>> >>> >>>>> >>>>>> level.
>> >>> >>>>> >>>>>> Please
>> >>> >>>>> >>>>>> let us
>> >>> >>>>> >>>>>> know if it would be possible to get such data and if so
>> >>> >>>>> >>>>>> how.
>> >>> >>>>> >>>>>> Thank you
>> >>> >>>>> >>>>>> in
>> >>> >>>>> >>>>>> advance for your help.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Best,
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Hirav Gandhi
>> >>> >>>>> >>>>>> _______________________________________________
>> >>> >>>>> >>>>>> Analytics mailing list
>> >>> >>>>> >>>>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> -------------- next part --------------
>> >>> >>>>> >>>>>> An HTML attachment was scrubbed...
>> >>> >>>>> >>>>>> URL:
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> ------------------------------
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Message: 2
>> >>> >>>>> >>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>> >>> >>>>> >>>>>> From: Oliver Keyes <okeyes@wikimedia.org>
>> >>> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and
>> >>> >>>>> >>>>>> everybody
>> >>> >>>>> >>>>>> who
>> >>> >>>>> >>>>>> has an interest in Wikipedia and analytics."
>> >>> >>>>> >>>>>> <analytics@lists.wikimedia.org>
>> >>> >>>>> >>>>>> Cc: Bharath Sitaraman <bharath1028@gmail.com>
>> >>> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent
>> >>> >>>>> >>>>>> than
>> >>> >>>>> >>>>>> hourly
>> >>> >>>>> >>>>>> basis
>> >>> >>>>> >>>>>> Message-ID:
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=hxPg@mail.gmail.com>
>> >>> >>>>> >>>>>> Content-Type: text/plain; charset=UTF-8
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Preeetty sure that Toby is on the analytics list, Pine.
>> >>> >>>>> >>>>>> He's
>> >>> >>>>> >>>>>> the
>> >>> >>>>> >>>>>> director of analytics.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Hirav: would you be looking for temporally /and/
>> >>> >>>>> >>>>>> contextually
>> >>> >>>>> >>>>>> granular
>> >>> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >>> >>>>> >>>>>> temporally
>> >>> >>>>> >>>>>> granular, so "a view to a page on enwiki at X time"? If
>> >>> >>>>> >>>>>> the
>> >>> >>>>> >>>>>> latter
>> >>> >>>>> >>>>>> you've got more of a shot, I suspect.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> On 13 April 2015 at 03:47, Pine W <wiki.pine@gmail.com>
>> >>> >>>>> >>>>>> wrote:
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Hi,
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> This issue of pageview data granularity has been
>> >>> >>>>> >>>>>> discussed
>> >>> >>>>> >>>>>> before, and
>> >>> >>>>> >>>>>> the
>> >>> >>>>> >>>>>> answer has been that hourly is the smallest increment
>> >>> >>>>> >>>>>> allowed to
>> >>> >>>>> >>>>>> be
>> >>> >>>>> >>>>>> revealed
>> >>> >>>>> >>>>>> publicly, for privacy reasons.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> I believe that the person you will want to discuss your
>> >>> >>>>> >>>>>> request
>> >>> >>>>> >>>>>> with is
>> >>> >>>>> >>>>>> Toby, who I have cc'd here.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Pine
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >>> >>>>> >>>>>> <hirav.gandhi@gmail.com>
>> >>> >>>>> >>>>>> wrote:
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Hi Wikimedia Analytics Team,
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic
>> >>> >>>>> >>>>>> server
>> >>> >>>>> >>>>>> allocation
>> >>> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to
>> >>> >>>>> >>>>>> test
>> >>> >>>>> >>>>>> our
>> >>> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an
>> >>> >>>>> >>>>>> amazing data
>> >>> >>>>> >>>>>> set
>> >>> >>>>> >>>>>> of hourly page views, but we were looking for something a
>> >>> >>>>> >>>>>> bit
>> >>> >>>>> >>>>>> more
>> >>> >>>>> >>>>>> granular,
>> >>> >>>>> >>>>>> such as aggregated page requests to English Wikipedia on
>> >>> >>>>> >>>>>> a
>> >>> >>>>> >>>>>> minute by
>> >>> >>>>> >>>>>> minute
>> >>> >>>>> >>>>>> basis or second by second basis if possible.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> We are more than happy to pour through any raw data you
>> >>> >>>>> >>>>>> might
>> >>> >>>>> >>>>>> have that
>> >>> >>>>> >>>>>> would help us calculate page requests at this granular
>> >>> >>>>> >>>>>> level.
>> >>> >>>>> >>>>>> Please
>> >>> >>>>> >>>>>> let us
>> >>> >>>>> >>>>>> know if it would be possible to get such data and if so
>> >>> >>>>> >>>>>> how.
>> >>> >>>>> >>>>>> Thank you
>> >>> >>>>> >>>>>> in
>> >>> >>>>> >>>>>> advance for your help.
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Best,
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> Hirav Gandhi
>> >>> >>>>> >>>>>> _______________________________________________
>> >>> >>>>> >>>>>> Analytics mailing list
>> >>> >>>>> >>>>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> _______________________________________________
>> >>> >>>>> >>>>>> Analytics mailing list
>> >>> >>>>> >>>>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> --
>> >>> >>>>> >>>>>> Oliver Keyes
>> >>> >>>>> >>>>>> Research Analyst
>> >>> >>>>> >>>>>> Wikimedia Foundation
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> ------------------------------
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> _______________________________________________
>> >>> >>>>> >>>>>> Analytics mailing list
>> >>> >>>>> >>>>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> End of Analytics Digest, Vol 38, Issue 21
>> >>> >>>>> >>>>>> *****************************************
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>> _______________________________________________
>> >>> >>>>> >>>>>> Analytics mailing list
>> >>> >>>>> >>>>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>>
>> >>> >>>>> >>>>> --
>> >>> >>>>> >>>>> Oliver Keyes
>> >>> >>>>> >>>>> Research Analyst
>> >>> >>>>> >>>>> Wikimedia Foundation
>> >>> >>>>> >>>>
>> >>> >>>>> >>>>
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> --
>> >>> >>>>> >>>> Oliver Keyes
>> >>> >>>>> >>>> Research Analyst
>> >>> >>>>> >>>> Wikimedia Foundation
>> >>> >>>>> >>>>
>> >>> >>>>> >>>>
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> ------------------------------
>> >>> >>>>> >>>>
>> >>> >>>>> >>>> _______________________________________________
>> >>> >>>>> >>>> Analytics mailing list
>> >>> >>>>> >>>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>
>> >>> >>>>> >>>
>> >>> >>>>> >>> _______________________________________________
>> >>> >>>>> >>> Analytics mailing list
>> >>> >>>>> >>> Analytics@lists.wikimedia.org
>> >>> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>>
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> --
>> >>> >>>>> >> Oliver Keyes
>> >>> >>>>> >> Research Analyst
>> >>> >>>>> >> Wikimedia Foundation
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> ------------------------------
>> >>> >>>>> >>
>> >>> >>>>> >> _______________________________________________
>> >>> >>>>> >> Analytics mailing list
>> >>> >>>>> >> Analytics@lists.wikimedia.org
>> >>> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>> >>
>> >>> >>>>> >>
>> >>> >>>>> >> End of Analytics Digest, Vol 38, Issue 24
>> >>> >>>>> >> *****************************************
>> >>> >>>>> >
>> >>> >>>>> >
>> >>> >>>>> > _______________________________________________
>> >>> >>>>> > Analytics mailing list
>> >>> >>>>> > Analytics@lists.wikimedia.org
>> >>> >>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> --
>> >>> >>>>> Oliver Keyes
>> >>> >>>>> Research Analyst
>> >>> >>>>> Wikimedia Foundation
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> --
>> >>> >>>> Dario Taraborelli
>> >>> >>>> Senior Research Scientist, Research and Data Lead
>> >>> >>>> Wikimedia Foundation
>> >>> >>>> http://wikimediafoundation.org
>> >>> >>>> http://nitens.org/taraborelli
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> --
>> >>> >>> Dario Taraborelli
>> >>> >>> Senior Research Scientist, Research and Data Lead
>> >>> >>> Wikimedia Foundation
>> >>> >>> http://wikimediafoundation.org
>> >>> >>> http://nitens.org/taraborelli
>> >>> >>
>> >>> >>
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Oliver Keyes
>> >>> Research Analyst
>> >>> Wikimedia Foundation
>> >>>
>> >>> _______________________________________________
>> >>> Analytics mailing list
>> >>> Analytics@lists.wikimedia.org
>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Dario Taraborelli
>> >> Senior Research Scientist, Research and Data Lead
>> >> Wikimedia Foundation
>> >> http://wikimediafoundation.org
>> >> http://nitens.org/taraborelli
>> >
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation



--
Bharath Sitaraman
bharath@cs.stanford.edu



--
Bharath Sitaraman
bharath@cs.stanford.edu




--
Oliver Keyes
Research Analyst
Wikimedia Foundation