Hey,
This may be a known, but just in case it isn't; the pageview dumps at
http://dumps.wikimedia.org/other/pagecounts-all-sites/ are meant to
follow the spec set out at
http://dumps.wikimedia.org/other/pagecounts-all-sites/README.txt
Instead, it appears that for (presumably, zero-rated) requests, we're
ending up with lang_code.zero instead of lang_code.project_variant.
Presumably it's a missed use case in the C/Perl...thing, we were
using, that got ported to Hive? Check out pagecounts-20150301-000000
for an example.
I've opened a phabricator ticket at
https://phabricator.wikimedia.org/T92361 - this is just an advisory to
analytics engineers (there is a bug) and to reusers (there is a bug.
We're aware of the bug).
Have fun,
--
Oliver Keyes
Research Analyst
Wikimedia Foundation