Hi Diederik,
[ rearranged response due to top-posting ]
On Mon, Jul 22, 2013 at 06:28:18AM -0700, Diederik van Liere wrote:
On Mon, Jul 22, 2013 at 9:27 AM, Christian Aistleitner christian@quelltextlich.at wrote:
Whom could I ask about what the desired semantics of zero_{carrier,country} are?
Diederik and Evan
so this means that:
zero_country should hold all of the mobile requests from a country, and zero_carrier is a drill-down on the specific carriers.
would be the correct interpretation?
If so, we cannot compute zero_country with the input files we get.
But even if we had all the logs, the code of zero_carrier.pig would not produce a subset of zero_country.pig, as it uses completely different approach to determine a count-worthy log line [1].
But should not zero_carrier be a subset of zero_country then?
Best regards, Christian
[1] A rough transcript of the predicates is:
* zero_carrier :<==> (carrier is set) && (host contains "wikipedia") && (url path contains "/wiki/") && (url is mobile and mobile is free for carrier || url is zero and zero is free for carrier) && (language is included for carrier)
-------------------
* zero_country :<==> (host contains "wiki") && query does not contain "action=opensearch" && query does not contain "action=search" && query does not contain "title=Special%3ASearch&search" && file does not contains "wiki?search" && host does not contain "bits" && host does not contain "upload" && (url is not for image || (url is for image && mime type contains "image")) && (url is not for api || (url is for api && mime type is "application" && mime subtype is "json")) && ((url path does not contain "wiki" or "/w/index.php") || ((url path contains "wiki" or "/w/index.php") && mime type is "text" && (mime subtype is "html" || mime subtype is "vnd.wap.wml"))) && (url is not for image && url is not for api && url path does not contain "wiki" && url path does not contain "/w/index.php" && mime type is "text" && mime subtype is "html") && response code matches ".*(20\d|302|304).*" && lower cased request method contains "get" && ip address is not in 10.0.0.0/8 && ip address is not in 208.80.152.0/22 && ip address is not in 91.198.174.0/24 && user agent does not contain "bot" && user agent does not contain "spider" && user agent does not contain "http" && user agent does not contain "crawler" && (url is for api || (url path contains "wiki" or "/w/index.php")) && (url is for api && referrer is null && isApiPageViewRequest(url)) && (url is for api && referrer is not null && isApiPageViewRequest(referrer) && (canonical titles of paramA and paramB are both null || canonical titles of paramA and paramB are not equivalent)) && (url is for api && referrer is not null && !isApiPageViewRequest(referrer)) && (url path contains "/wiki/" or "/w/index.php")
-------------------
isApiPageViewRequest(paramA) :<==> (path of param contains "/w/api.php") && (param's query contains "action=view" || param's query contains "action=mobileview" || param's query contains "action=query")
-------------------