Oliver,

It was a mistake from me to add the 'outreach' subdomain without asking you.

From a documentation perspective, the analytics team uses that place to document changes: https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest and I didn't know about up-to-date documentation you sent.

Tickets have been created to both correct the bug and update the documentation pages.

Joseph



On Sun, Aug 16, 2015 at 8:47 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Ah, I see the problem; someone patched it and never documented it.

We have documentation at
https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
of the generalised filters. There is also a log, on
https://meta.wikimedia.org/wiki/Research:Page_view, of changes to the
pageview definition.

The intent behind both the transparent definition and the log is to
ensure that we know what is going /in/ the definition.

In this case, somebody has patched the definition
(https://github.com/wikimedia/analytics-refinery-source/commit/cc0b6ed7e4f403eaa82235ec6a0f27152b0c2710)
to include traffic from outreach.wikimedia.org - a site that was very
deliberately and very explicitly excluded from the definition as it
was written.

There is no explanation of why this change was made, there is no
documentation of this change even existing outside the actual Java....
can someone please explain what this is for, and update all the
documentation to reflect that? And then could people be very, very
clear in future that it is expected there be a log of alterations you
make to high-level KPIs beyond the, you know, commit logs.

On 16 August 2015 at 14:32, Madhumitha Viswanathan
<mviswanathan@wikimedia.org> wrote:
> The new one.
>
> The code that generates it -
>
> -
> https://github.com/wikimedia/analytics-refinery/blob/master/hive/pageview/hourly/create_pageview_hourly_table.hql
> -
> https://github.com/wikimedia/analytics-refinery/tree/master/oozie/pageview/hourly
>
>
>
> On Sun, Aug 16, 2015 at 11:01 AM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>
>> Is the pageviews_hourly table meant to contain pageviews according to
>> the new or old definition? If old, where can I find aggregates for the
>> new one?
>>
>> --
>> Oliver Keyes
>> Count Logula
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> --
> --Madhu :)
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



--
Joseph Allemandou
Data Engineer @ Wikimedia Foundation
IRC: joal