I am trying to migrate Limn graphs to our own handling. Currently Zero
graphs are generated as Limn dashboards, and after I applied this filter
(taken from the HQL for counting article pageviews), i got matching (about
10% discrepancy) between our graphs and limn. Yet, one partner has
discrepancy of 10 times, and I would like to see where that mismatch comes
from. I looked at
https://github.com/wikimedia/analytics-wp-zero but it
seems there is other code that's missing from that repo. Any suggestions
are welcome. Thanks!
WHERE
webrequest_source IN ('text', 'mobile')
AND year=${year}
AND month=${month}
AND day=${day}
AND x_analytics LIKE '%zero=%'
AND SUBSTR(uri_path, 1, 6) = '/wiki/'
AND (
(
SUBSTR(ip, 1, 9) != '10.128.0.'
AND SUBSTR(ip, 1, 11) NOT IN (
'208.80.152.',
'208.80.153.',
'208.80.154.',
'208.80.155.',
'91.198.174.'
)
) OR x_forwarded_for != '-'
)
AND SUBSTR(uri_path, 1, 31) != '/wiki/Special:CentralAutoLogin/'
AND http_status NOT IN ( '301', '302', '303' )
AND uri_host RLIKE
'^[A-Za-z0-9-]+(\\.(zero|m))?\\.[a-z]*\\.org$'
AND NOT (SPLIT(TRANSLATE(SUBSTR(uri_path, 7), ' ', '_'),
'#')[0] RLIKE '^[Uu]ndefined$')