Hi Diederik,
[ rearranged response due to top-posting ]
On Mon, Jul 22, 2013 at 06:28:18AM -0700, Diederik van Liere wrote:
> On Mon, Jul 22, 2013 at 9:27 AM, Christian Aistleitner
> <christian@quelltextlich.at> wrote:
> > Whom could I ask about what the desired semantics of> Diederik and Evan
> > zero_{carrier,country} are?
>
so this means that:
would be the correct interpretation?
> > zero_country should
> > hold all of the mobile requests from a country, and zero_carrier is a
> > drill-down on the specific carriers.
If so, we cannot compute zero_country with the input files we get.
But even if we had all the logs, the code of zero_carrier.pig would
not produce a subset of zero_country.pig, as it uses completely different
approach to determine a count-worthy log line [1].
But should not zero_carrier be a subset of zero_country then?
Best regards,
Christian
[1] A rough transcript of the predicates is:
* zero_carrier :<==>
(carrier is set)
&& (host contains "wikipedia")
&& (url path contains "/wiki/")
&& (url is mobile and mobile is free for carrier
|| url is zero and zero is free for carrier)
&& (language is included for carrier)
-------------------
* zero_country :<==>
(host contains "wiki")
&& query does not contain "action=opensearch"
&& query does not contain "action=search"
&& query does not contain "title=Special%3ASearch&search"
&& file does not contains "wiki?search"
&& host does not contain "bits"
&& host does not contain "upload"
&& (url is not for image || (url is for image && mime type contains "image"))
&& (url is not for api || (url is for api && mime type is "application" && mime subtype is "json"))
&& ((url path does not contain "wiki" or "/w/index.php")
|| ((url path contains "wiki" or "/w/index.php") && mime type is "text"
&& (mime subtype is "html" || mime subtype is "vnd.wap.wml")))
&& (url is not for image
&& url is not for api
&& url path does not contain "wiki"
&& url path does not contain "/w/index.php"
&& mime type is "text" && mime subtype is "html")
&& response code matches ".*(20\\d|302|304).*"
&& lower cased request method contains "get"
&& ip address is not in 10.0.0.0/8
&& ip address is not in 208.80.152.0/22
&& ip address is not in 91.198.174.0/24
&& user agent does not contain "bot"
&& user agent does not contain "spider"
&& user agent does not contain "http"
&& user agent does not contain "crawler"
&& (url is for api || (url path contains "wiki" or "/w/index.php"))
&& (url is for api && referrer is null && isApiPageViewRequest(url))
&& (url is for api && referrer is not null
&& isApiPageViewRequest(referrer)
&& (canonical titles of paramA and paramB are both null
|| canonical titles of paramA and paramB are not equivalent))
&& (url is for api && referrer is not null && !isApiPageViewRequest(referrer))
&& (url path contains "/wiki/" or "/w/index.php")
-------------------
isApiPageViewRequest(paramA) :<==>
(path of param contains "/w/api.php")
&& (param's query contains "action=view"
|| param's query contains "action=mobileview"
|| param's query contains "action=query")
-------------------
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian@quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics