Afterwards, I've been told that this is ok, as zero_country should hold all of the mobile requests from a country, and zero_carrier is a drill-down on the specific carriers.
You know, since we are filtering on the X-Analytics header to capture these logs, we are not going to be able to get zero_country from them. These logs are not all of the mobile webrequest logs.
On Jul 22, 2013, at 9:27 AM, Christian Aistleitner christian@quelltextlich.at wrote:
Hi,
when doing some basic sanity checks between the output of the existing zero_country and zero_carrier Pig scripts, it seems that the sum of the number of requests of the output of zero_country per day is ~40k larger than for zero_carrier.
First, I've been told that the sum of the number of requests has to match.
Afterwards, I've been told that this is ok, as zero_country should hold all of the mobile requests from a country, and zero_carrier is a drill-down on the specific carriers.
When reading the Pig scripts/Java code, it is obvious that the first explanation does not meet the code. The scripts take completely different paths through our code base and count completely different things :-(
However, the latter explanation does not make much sense to me either, as it's hard to believe that the requests from our zero partners make up >90% of each countries mobile requests. Besides, this explanation would not meet how we generate the raw log files.
Whom could I ask about what the desired semantics of zero_{carrier,country} are?
Best regards, Christian
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics