Hi Diederik,
[ rearranged response due to top-posting ]
On Mon, Jul 22, 2013 at 06:28:18AM -0700, Diederik van Liere wrote:
On Mon, Jul 22, 2013 at 9:27 AM, Christian
Aistleitner
<christian(a)quelltextlich.at> wrote:
Whom could I ask about what the desired semantics
of
zero_{carrier,country} are?
Diederik and Evan
so this means that:
> zero_country should
> hold all of the mobile requests from a country, and zero_carrier is a
> drill-down on the specific carriers.
would be the correct interpretation?
If so, we cannot compute zero_country with the input files we get.
But even if we had all the logs, the code of zero_carrier.pig would
not produce a subset of zero_country.pig, as it uses completely different
approach to determine a count-worthy log line [1].
But should not zero_carrier be a subset of zero_country then?
Best regards,
Christian
[1] A rough transcript of the predicates is:
* zero_carrier :<==>
(carrier is set)
&& (host contains "wikipedia")
&& (url path contains "/wiki/")
&& (url is mobile and mobile is free for carrier
|| url is zero and zero is free for carrier)
&& (language is included for carrier)
-------------------
* zero_country :<==>
(host contains "wiki")
&& query does not contain "action=opensearch"
&& query does not contain "action=search"
&& query does not contain "title=Special%3ASearch&search"
&& file does not contains "wiki?search"
&& host does not contain "bits"
&& host does not contain "upload"
&& (url is not for image || (url is for image && mime type contains
"image"))
&& (url is not for api || (url is for api && mime type is
"application" && mime subtype is "json"))
&& ((url path does not contain "wiki" or "/w/index.php")
|| ((url path contains "wiki" or "/w/index.php") && mime
type is "text"
&& (mime subtype is "html" || mime subtype is
"vnd.wap.wml")))
&& (url is not for image
&& url is not for api
&& url path does not contain "wiki"
&& url path does not contain "/w/index.php"
&& mime type is "text" && mime subtype is "html")
&& response code matches ".*(20\\d|302|304).*"
&& lower cased request method contains "get"
&& ip address is not in 10.0.0.0/8
&& ip address is not in 208.80.152.0/22
&& ip address is not in 91.198.174.0/24
&& user agent does not contain "bot"
&& user agent does not contain "spider"
&& user agent does not contain "http"
&& user agent does not contain "crawler"
&& (url is for api || (url path contains "wiki" or
"/w/index.php"))
&& (url is for api && referrer is null &&
isApiPageViewRequest(url))
&& (url is for api && referrer is not null
&& isApiPageViewRequest(referrer)
&& (canonical titles of paramA and paramB are both null
|| canonical titles of paramA and paramB are not equivalent))
&& (url is for api && referrer is not null &&
!isApiPageViewRequest(referrer))
&& (url path contains "/wiki/" or "/w/index.php")
-------------------
isApiPageViewRequest(paramA) :<==>
(path of param contains "/w/api.php")
&& (param's query contains "action=view"
|| param's query contains "action=mobileview"
|| param's query contains "action=query")
-------------------
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage:
http://quelltextlich.at/
---------------------------------------------------------------