Well, again; the wikistats data that Erik refers to doesn't have any
granularity within the period this dataset covers. Monthly data misses
sub-monthly noise - like a massive transition that only kicks in on
the day-by-day.
On 12 March 2015 at 18:21, Toby Negrin <tnegrin@wikimedia.org> wrote:
> I'm also confused. As I understand it, stats.wikimedia.org is consuming the
> data that is represented by the green line in your graph. Therefore we would
> see this drop in the wikistats data that Erik referred to, but we don't. I
> think we need to understand why this is so.
>
> -Toby
>
> On Thu, Mar 12, 2015 at 3:10 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>
>> Well, I'm no longer our resident anything expert, merely /a/ anything
>> expert :).
>>
>> The "concoction", as you put it, comes from the webrequest_all_sites
>> data that is consumed by stats.wikimedia.org's primary report - I
>> can't speak for how the dashboard you're linking to is constructed.
>> Perhaps you could? I doubt this is a "concoction" problem given that,
>> as you will note if you've studied the visualisations, both the UDF
>> and the hive query implementation (which were written by two different
>> people, and code reviewed by two /more/ people) agree that this
>> dramatic, unexplained and untracked drop happened. And, since we've
>> been using the hive query implementation for all our high-level
>> numbers for about six months, a bug of this magnitude in the
>> /implementation/ of the definition would be....worrying.
>>
>> Indeed, your report says 20B per month (again, is it drawing from the
>> same data source as the aggregate, high-level number?) - I never
>> claimed 1.1B a day, you did. Instead, it started off as approximately
>> 1.1-1.2Bn, before dropping down to between 600m and 700m, where it has
>> resided ever since. That sounds, averaged, like approximately 0.75B,
>> no? The disadvantage of comparing a single monthly number against a
>> more granular dataset.
>>
>> On 12 March 2015 at 17:55, Erik Zachte <ezachte@wikimedia.org> wrote:
>> > I'd rather see you explain this, Oliver, as our incumbent page views
>> > expert.
>> > Your concoction of legacy PV seems to suggest 'Old definition, UDF' was
>> > about 1.1B per day.
>> >
>> > Yet http://stats.wikimedia.org/EN/TablesPageViewsMonthlyAllProjects.htm
>> > shows 20B per month, 0.75B per day
>> >
>> > Erik
>> >
>> > -----Original Message-----
>> > From: analytics-bounces@lists.wikimedia.org
>> > [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Oliver Keyes
>> > Sent: Thursday, March 12, 2015 19:38
>> > To: A mailing list for the Analytics Team at WMF and everybody who has
>> > an interest in Wikipedia and analytics.
>> > Subject: [Analytics] [Technical] final pageviews QA
>> >
>> > Hey all,
>> >
>> > After the patches to the definition following the previous hand-coding
>> > run (see older threads) I've run a second set of tests. These can be seen at
>> > https://commons.wikimedia.org/wiki/File:Pageviews_QA_2.png and
>> > https://commons.wikimedia.org/wiki/File:Pageviews_QA_jittered_2.png
>> >
>> > There's nothing particularly shocking in the new definition; it follows
>> > the seasonal pattern that we're used to. I think we can call the new
>> > definition done, with these tweaks! It's also not as unstable as the legacy
>> > definition (good luck to whoever now has the responsibility of explaining
>> > why pageviews abruptly halved in the middle of February).
>> >
>> >
>> > Have fun,
>> > --
>> > Oliver Keyes
>> > Research Analyst
>> > Wikimedia Foundation
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics