Indeed. For transparency, Joseph, Andrew and myself had a meeting late last week to talk about how we handle these issues. The resolution was to go for positive, as well as negative, checking, probably using Christian's "guard" framework.
So, for example, suppose we want to make sure projects are what we want; one way is to have unit tests that contain things we do and don't want and to make sure they all pass on example data. But in addition we can build a list of /all/ the projects we want and have the pageviews_hourly table run through that list once every N, issuing an error if there are projects that appear that aren't in the list. Sometimes they will be false positives, but that is the advantage of positive checks - when it is wrong it tells you. When unit tests are wrong they don't always ;)
On 23 August 2015 at 04:34, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Tilman Bayer, 22/08/2015 19:33:
And I know that other issues were caught by ErikZ's proactive vigilance, which will need to find an equivalent in the upcoming replacement for Wikistats.
+1
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics