Hey Hirav!
(Sorry, was arguing with the cluster)
Good news: even with the additional granularity (splitting out mobile and desktop varnishes) Dario and I are comfortable with releasing the data, based on the local minima we see.
Bad news: the cluster is currently fairly overwhelmed Result: I'm having to run the query day by day and sanitise the results. Bleh.
So: not fast. But running, and the data will be available :).
On 17 April 2015 at 19:01, Hirav Gandhi hirav.gandhi@gmail.com wrote:
Hi guys,
Any update on this ask and where/when it would be available?
Hirav
On Thu, Apr 16, 2015 at 12:33 PM, Bharath Sitaraman < bharath@cs.stanford.edu> wrote:
Interested in an Erlang book? :P Pretty sure I have one of those laying around here...
Cheers, Bharath ᐧ
On Wed, Apr 15, 2015 at 10:38 AM, Oliver Keyes okeyes@wikimedia.org wrote:
I accept payment in books, pull requests and speaking invitations ;p.
(Updated check-the-minimum query running now!)
On 15 April 2015 at 13:35, Hirav Gandhi hirav.gandhi@gmail.com wrote:
Sorry Oliver. Let me know where I can send the beer/coffee money to compensate you for the hard work :)
On Wed, Apr 15, 2015 at 10:34 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
/This/ you say 2.5 seconds after I've launched the query ;p. Yes, it is possible, but I'll have to recalculate the likely minimum and check that it's still okay.
On 15 April 2015 at 13:32, Hirav Gandhi hirav.gandhi@gmail.com
wrote:
Hi Dario,
One last question - would it be possible to break it out into
mobile vs
desktop? We are also concerned there might be seasonality effects in there as well. Please let us know.
Best,
Hirav
On Wed, Apr 15, 2015 at 10:27 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote: > > thanks, both. Let's go ahead with English only and no spiders
filtered
> or > mobile/desktop breakdown, per Oliver. > > Michelle – given the aggregation level I am fine moving forward
with
> this > release, but let me know off-thread if you have any questions. > > Dario > > On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes <
okeyes@wikimedia.org>
> wrote: >> >> Dario, >> >> No spider filtering, and no split between mobile and desktop;
mobile
>> and desktop are grouped. >> >> On 15 April 2015 at 12:46, Hirav Gandhi hirav.gandhi@gmail.com >> wrote: >> > e.g. German* >> > >> > I need more coffee. >> > >> > >> > >> > On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi >> > hirav.gandhi@gmail.com >> > wrote: >> >> >> >> Dario - we just want a representative samples of traffic for a >> >> popular >> >> site like Wikipedia. We thought limiting to the English
Wikipedia
>> >> would be >> >> easier. >> >> >> >> If we get aggregated data across all language Wikipedia sites,
we
>> >> would >> >> need someway to tease out which language is being queried when. >> >> Some >> >> languages (for e.g. German) we would hypothesize would have
more
>> >> daily >> >> seasonality than languages like English. >> >> >> >> >> >> >> >> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli >> >> dtaraborelli@wikimedia.org wrote: >> >>> >> >>> Hirav, Bharath – I also want to hear from you if there's a >> >>> specific >> >>> reason to ask for English Wikipedia only or if a dataset >> >>> encompassing >> >>> aggregate pageviews across all Wikimedia properties would do
the
>> >>> job. >> >>> >> >>> Dario >> >>> >> >>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli >> >>> dtaraborelli@wikimedia.org wrote: >> >>>> >> >>>> Oliver -- thanks for running a preliminary check, I'm fine >> >>>> releasing >> >>>> this data in aggregate under CC0, I believe it would be
valuable
>> >>>> for >> >>>> this >> >>>> and other research projects (copying Michelle from Legal). >> >>>> >> >>>> Before we do so, though, I want to confirm the specs:
aggregate
>> >>>> pageviews per second to English Wikipedia, excluding bot
traffic,
>> >>>> broken >> >>>> down by access method (mobile web vs desktop site, not apps)
for
>> >>>> a >> >>>> 60-day >> >>>> period. Oliver – are these the filters you used to identify
the
>> >>>> data >> >>>> point >> >>>> with the smallest number of observations? >> >>>> >> >>>> Obviously, we will need to take into account this release
when we
>> >>>> start >> >>>> working on projects such as >> >>>> >> >>>> >> >>>>
https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits
>> >>>> and >> >>>> >> >>>> >> >>>>
https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pagevi...
>> >>>> >> >>>> Dario >> >>>> >> >>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes >> >>>> okeyes@wikimedia.org >> >>>> wrote: >> >>>>> >> >>>>> Bumping for Dario, per Pine's excellent example :) >> >>>>> >> >>>>> On 13 April 2015 at 22:18, Hirav Gandhi <
hirav.gandhi@gmail.com>
>> >>>>> wrote: >> >>>>> > Oliver: Two months is fine. Thank you so much for your
help!
>> >>>>> > >> >>>>> >> On Apr 13, 2015, at 4:40 PM, >> >>>>> >> analytics-request@lists.wikimedia.org >> >>>>> >> wrote: >> >>>>> >> >> >>>>> >> Send Analytics mailing list submissions to >> >>>>> >> analytics@lists.wikimedia.org >> >>>>> >> >> >>>>> >> To subscribe or unsubscribe via the World Wide Web, visit >> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >> or, via email, send a message with subject or body
'help' to
>> >>>>> >> analytics-request@lists.wikimedia.org >> >>>>> >> >> >>>>> >> You can reach the person managing the list at >> >>>>> >> analytics-owner@lists.wikimedia.org >> >>>>> >> >> >>>>> >> When replying, please edit your Subject line so it is
more
>> >>>>> >> specific >> >>>>> >> than "Re: Contents of Analytics digest..." >> >>>>> >> >> >>>>> >> >> >>>>> >> Today's Topics: >> >>>>> >> >> >>>>> >> 1. Re: Page views on a more frequent than hourly basis
(Pine
>> >>>>> >> W) >> >>>>> >> 2. Re: Page views on a more frequent than hourly basis
(Hirav
>> >>>>> >> Gandhi) >> >>>>> >> 3. Re: Page views on a more frequent than hourly basis >> >>>>> >> (Oliver >> >>>>> >> Keyes) >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >>
>> >>>>> >> >> >>>>> >> Message: 1 >> >>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700 >> >>>>> >> From: Pine W wiki.pine@gmail.com >> >>>>> >> To: "A mailing list for the Analytics Team at WMF and >> >>>>> >> everybody >> >>>>> >> who >> >>>>> >> has an interest in Wikipedia and analytics." >> >>>>> >> analytics@lists.wikimedia.org >> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent
than
>> >>>>> >> hourly >> >>>>> >> basis >> >>>>> >> Message-ID: >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> <CAF=
dyJjZMdfTHZ+0+LwnHb9m8xUOd4WetGCFUXYB9Qyf7CyC5Q@mail.gmail.com>
>> >>>>> >> Content-Type: text/plain; charset="utf-8" >> >>>>> >> >> >>>>> >> Hi Oliver, re ccing people who are on list, this is the >> >>>>> >> protocol >> >>>>> >> we >> >>>>> >> followed in IEGCom to ping people who are subscribed and >> >>>>> >> mentioned >> >>>>> >> in >> >>>>> >> certain emails but, like many of us, may automatically
move
>> >>>>> >> emails >> >>>>> >> from >> >>>>> >> lists directly to folders where they may be unread for
days.
>> >>>>> >> So >> >>>>> >> there is a >> >>>>> >> reason to do this. >> >>>>> >> >> >>>>> >> Thanks, >> >>>>> >> >> >>>>> >> Pine >> >>>>> >>