On Mon, Jul 22, 2013 at 11:14 AM, Christian Aistleitner <christian@quelltextlich.at> wrote:
Hi Diederik,

[ rearranged response due to top-posting ]

On Mon, Jul 22, 2013 at 06:28:18AM -0700, Diederik van Liere wrote:
> On Mon, Jul 22, 2013 at 9:27 AM, Christian Aistleitner
> <christian@quelltextlich.at> wrote:
> > Whom could I ask about what the desired semantics of
> > zero_{carrier,country} are?
>
> Diederik and Evan

so this means that:

> > zero_country should
> > hold all of the mobile requests from a country, and zero_carrier is a
> > drill-down on the specific carriers.

would be the correct interpretation?

If so, we cannot compute zero_country with the input files we get.
That's correct, we need the full mobile stream from Kraken, so we have to let go of this for now.

 
But even if we had all the logs, the code of zero_carrier.pig would
not produce a subset of zero_country.pig, as it uses completely different
approach to determine a count-worthy log line [1].

I think you are looking at the wrong zero_country.pig script, the logic should be the same for zero_country and zero_carrier.

 
But should not zero_carrier be a subset of zero_country then?
Yes zero_carrier is a subset of zero_country.
 

Best regards,
Christian


[1] A rough transcript of the predicates is:

* zero_carrier :<==>
   (carrier is set)
&& (host contains "wikipedia")
&& (url path contains "/wiki/")
&& (url is mobile and mobile is free for carrier
    || url is zero and zero is free for carrier)
&& (language is included for carrier)

-------------------

* zero_country :<==>
   (host contains "wiki")
&& query does not contain "action=opensearch"
&& query does not contain "action=search"
&& query does not contain "title=Special%3ASearch&search"
&& file does not contains "wiki?search"
&& host does not contain "bits"
&& host does not contain "upload"
&& (url is not for image || (url is for image && mime type contains "image"))
&& (url is not for api  || (url is for api && mime type is "application" && mime subtype is "json"))
&& ((url path does not contain "wiki" or "/w/index.php")
    || ((url path contains "wiki" or "/w/index.php") && mime type is "text"
         && (mime subtype is "html" || mime subtype is "vnd.wap.wml")))
&& (url is not for image
    && url is not for api
    && url path does not contain "wiki"
    && url path does not contain "/w/index.php"
    && mime type is "text" && mime subtype is "html")
&& response code matches ".*(20\\d|302|304).*"
&& lower cased request method contains "get"
&& ip address is not in 10.0.0.0/8
&& ip address is not in 208.80.152.0/22
&&
ip address is not in 91.198.174.0/24
&& user agent does not contain "bot"
&& user agent does not contain "spider"
&& user agent does not contain "http"
&& user agent does not contain "crawler"
&& (url is for api || (url path contains "wiki" or "/w/index.php"))
&& (url is for api && referrer is null && isApiPageViewRequest(url))
&& (url is for api && referrer is not null
    && isApiPageViewRequest(referrer)
    && (canonical titles of paramA and paramB are both null
        || canonical titles of paramA and paramB are not equivalent))
&& (url is for api && referrer is not null && !isApiPageViewRequest(referrer))
&& (url path contains "/wiki/" or "/w/index.php")

-------------------

isApiPageViewRequest(paramA) :<==>
   (path of param contains "/w/api.php")
&& (param's query contains "action=view"
    || param's query contains "action=mobileview"
    || param's query contains "action=query")

-------------------


--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a        Email:  christian@quelltextlich.at
4040 Linz, Austria           Phone:          +43 732 / 26 95 63
                             Fax:            +43 732 / 26 95 63
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics