This seems perfect. Is it currently used?
On 17 August 2015 at 18:03, Andrew Otto <aotto@wikimedia.org> wrote:
> BTW, Christian foresaw this issue and wrote this:
> https://github.com/wikimedia/analytics-refinery-source/tree/master/guard
>
> It should be useable for pageviews too, I think. For this issue, a guard that made sure that outreach.wikimedia.org never appeared would have been an error.
>
>
>
>
>
>> On Aug 17, 2015, at 14:45, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>
>> On 17 August 2015 at 13:48, Joseph Allemandou <jallemandou@wikimedia.org> wrote:
>>> Hey Oliver,
>>>
>>> The analytics team is responsible for the pageview definition.
>>> When finding issues, sending an email to the analytics mailing list is the
>>> right thing to do :)
>>>
>>
>> Indeed; my point is not about issues reported upstream. My point is
>> that there appears to currently be absolutely no work done to take
>> this (org-level, highest possible priority) KPI and evaluate it every
>> month or ever N days to make sure that, even with the gradual
>> accretion of changes to the input data, it is still extracting what we
>> want. It is down to user-reported issues. The problem with this
>> approach is that after 90 days it is impossible to rerun the data; if
>> there is a bug breaking the logs, and it takes more than 90 days to
>> discover it, those logs are simply broken.
>>
>> In addition, discovering these issues requires a very granular
>> understanding of what the pageviews logs are meant to be capturing
>> that most customers simply will not have. It worked in this case
>> primarily because the customer actually /wrote/ the definition ;p.
>>
>> For public transparency: Joseph and I talked on IRC and will be
>> working on ways to validate data and detect these kinds of regressions
>> in advance.
>>
>>> On our end, we could surely do a better job to communicate changes in the
>>> pageview definition code for anybody interested to review/comment/ask for
>>> documentation.
>>> Emails have been sent regularly about updates on the analytics list, except
>>> in the past few month.
>>> We shall get back to that good habit and send notifications with
>>> explanations of the changes.
>>>
>>> Joseph
>>>
>>>
>>>
>>>
>>> On Mon, Aug 17, 2015 at 5:15 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>>>
>>>> You should also note that donate-wiki pageviews are making it into the
>>>> counts (again, the definition was designed to exclude these).
>>>>
>>>> Whose job is it to review pageviews and update the definition when
>>>> issues are found?
>>>>
>>>> On 17 August 2015 at 10:32, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>>>> Just to clarify; there is no need to ask me before making changes
>>>>> (obviously I find my approval for pageviews changes being sought
>>>>> incredibly flattering, but I am not the only person involved in this
>>>>> project ;p). What I'm more driving towards is directly informing
>>>>> customers when the definition is adapted.
>>>>>
>>>>> On 17 August 2015 at 10:31, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>>>>> Excellent; thank you.
>>>>>>
>>>>>> On 17 August 2015 at 04:42, Joseph Allemandou
>>>>>> <jallemandou@wikimedia.org> wrote:
>>>>>>> Oliver,
>>>>>>>
>>>>>>> It was a mistake from me to add the 'outreach' subdomain without
>>>>>>> asking you.
>>>>>>>
>>>>>>> From a documentation perspective, the analytics team uses that place
>>>>>>> to
>>>>>>> document changes:
>>>>>>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest and I
>>>>>>> didn't
>>>>>>> know about up-to-date documentation you sent.
>>>>>>>
>>>>>>> Tickets have been created to both correct the bug and update the
>>>>>>> documentation pages.
>>>>>>>
>>>>>>> Joseph
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 16, 2015 at 8:47 PM, Oliver Keyes <okeyes@wikimedia.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Ah, I see the problem; someone patched it and never documented it.
>>>>>>>>
>>>>>>>> We have documentation at
>>>>>>>>
>>>>>>>> https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
>>>>>>>> of the generalised filters. There is also a log, on
>>>>>>>> https://meta.wikimedia.org/wiki/Research:Page_view, of changes to the
>>>>>>>> pageview definition.
>>>>>>>>
>>>>>>>> The intent behind both the transparent definition and the log is to
>>>>>>>> ensure that we know what is going /in/ the definition.
>>>>>>>>
>>>>>>>> In this case, somebody has patched the definition
>>>>>>>>
>>>>>>>>
>>>>>>>> (https://github.com/wikimedia/analytics-refinery-source/commit/cc0b6ed7e4f403eaa82235ec6a0f27152b0c2710)
>>>>>>>> to include traffic from outreach.wikimedia.org - a site that was very
>>>>>>>> deliberately and very explicitly excluded from the definition as it
>>>>>>>> was written.
>>>>>>>>
>>>>>>>> There is no explanation of why this change was made, there is no
>>>>>>>> documentation of this change even existing outside the actual
>>>>>>>> Java....
>>>>>>>> can someone please explain what this is for, and update all the
>>>>>>>> documentation to reflect that? And then could people be very, very
>>>>>>>> clear in future that it is expected there be a log of alterations you
>>>>>>>> make to high-level KPIs beyond the, you know, commit logs.
>>>>>>>>
>>>>>>>> On 16 August 2015 at 14:32, Madhumitha Viswanathan
>>>>>>>> <mviswanathan@wikimedia.org> wrote:
>>>>>>>>> The new one.
>>>>>>>>>
>>>>>>>>> The code that generates it -
>>>>>>>>>
>>>>>>>>> -
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/wikimedia/analytics-refinery/blob/master/hive/pageview/hourly/create_pageview_hourly_table.hql
>>>>>>>>> -
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/wikimedia/analytics-refinery/tree/master/oozie/pageview/hourly
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 16, 2015 at 11:01 AM, Oliver Keyes
>>>>>>>>> <okeyes@wikimedia.org>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Is the pageviews_hourly table meant to contain pageviews according
>>>>>>>>>> to
>>>>>>>>>> the new or old definition? If old, where can I find aggregates for
>>>>>>>>>> the
>>>>>>>>>> new one?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Oliver Keyes
>>>>>>>>>> Count Logula
>>>>>>>>>> Wikimedia Foundation
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Analytics mailing list
>>>>>>>>>> Analytics@lists.wikimedia.org
>>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --Madhu :)
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Analytics mailing list
>>>>>>>>> Analytics@lists.wikimedia.org
>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Oliver Keyes
>>>>>>>> Count Logula
>>>>>>>> Wikimedia Foundation
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Analytics mailing list
>>>>>>>> Analytics@lists.wikimedia.org
>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Joseph Allemandou
>>>>>>> Data Engineer @ Wikimedia Foundation
>>>>>>> IRC: joal
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Analytics mailing list
>>>>>>> Analytics@lists.wikimedia.org
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Oliver Keyes
>>>>>> Count Logula
>>>>>> Wikimedia Foundation
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Oliver Keyes
>>>>> Count Logula
>>>>> Wikimedia Foundation
>>>>
>>>>
>>>>
>>>> --
>>>> Oliver Keyes
>>>> Count Logula
>>>> Wikimedia Foundation
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>>
>>>
>>> --
>>> Joseph Allemandou
>>> Data Engineer @ Wikimedia Foundation
>>> IRC: joal
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>>
>>
>> --
>> Oliver Keyes
>> Count Logula
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
--
Oliver Keyes
Count Logula
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics