I tried to move Zero analytics to the new table,
and decided to test the new
wonderful fields like agent_type ... and it only works on the most recent
hours of data ((
On Fri, Apr 10, 2015 at 8:51 PM, Yuri Astrakhan <yastrakhan(a)wikimedia.org>
wrote:
Please clarify why the field "is_zero" is needed, as it is nothing more
than a test for ("zero=" in x_analytics). Does having this field
significantly improve performance for zero queries, e.g. "select count(*)
from requests where iszero = true" ? Because otherwise it simply identifies
"zero partner" traffic, not "was that request actually zero rated or
not".
Thanks!
On Fri, Apr 10, 2015 at 5:16 PM, Oliver Keyes <okeyes(a)wikimedia.org>
wrote:
>
> Cool!
>
> On 10 April 2015 at 17:12, Joseph Allemandou <jallemandou(a)wikimedia.org>
> wrote:
> > Yes Oliver, the agent_type = spider includes IsCrawler UDF.
> >
> > On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes <okeyes(a)wikimedia.org>
> > wrote:
> >>
> >> What does agent-type add? In the sense that if we're pre-parsing the
> >> user agent, surely the difference is between "WHERE agent_type !=
> >> 'spider'" and "WHERE
user_agent_map['device_family'] != 'Spider'"?
> >> Does agent_type include the isCrawler UDF results?
> >>
> >> On 10 April 2015 at 16:47, Joseph Allemandou
> >> <jallemandou(a)wikimedia.org>
> >> wrote:
> >> > And I forgot one field :
> >> >
> >> > is_zero - True if a request is made on a zero provider.
> >> >
> >> >
> >> > On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia
<leila(a)wikimedia.org>
> >> > wrote:
> >> >>
> >> >> Hi Joseph,
> >> >>
> >> >> Thanks for the update, and for doing this. These three items
> >> >> make
> >> >> the
> >> >> analysis of the data much easier on our end. We've had many
> >> >> requests in
> >> >> the
> >> >> past that required agent_type and access_method information and
> >> >> having
> >> >> them
> >> >> readily available is awesome! :-)
> >> >>
> >> >> Have a great weekend!
> >> >>
> >> >> Leila
> >> >>
> >> >> On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou
> >> >> <jallemandou(a)wikimedia.org> wrote:
> >> >>>
> >> >>> Hi Analytics people,
> >> >>>
> >> >>> Today happens another bunch of addition to the refined
webrequest
> >> >>> table
> >> >>> in hive.
> >> >>> Now the table contains:
> >> >>>
> >> >>> ts - The unix timestamp (milliseconds) version of the dt date
> >> >>> access_method - The method used to access the site, being one
of
> >> >>> the
> >> >>> three [mobile app | mobile web | desktop]
> >> >>> agent_type - To differentiate easily between spiders and users
> >> >>> (more
> >> >>> values may be added later).
> >> >>>
> >> >>> These additions are based on the "tags", as defined
here:
> >> >>>
https://meta.wikimedia.org/wiki/Research:Page_view
> >> >>>
> >> >>> Have a good weekend !
> >> >>>
> >> >>> --
> >> >>> Joseph Allemandou
> >> >>> Data Engineer @ Wikimedia Foundation
> >> >>> IRC: joal
> >> >>>
> >> >>> _______________________________________________
> >> >>> Analytics mailing list
> >> >>> Analytics(a)lists.wikimedia.org
> >> >>>
https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >>>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Analytics mailing list
> >> >> Analytics(a)lists.wikimedia.org
> >> >>
https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Joseph Allemandou
> >> > Data Engineer @ Wikimedia Foundation
> >> > IRC: joal
> >> >
> >> > _______________________________________________
> >> > Analytics mailing list
> >> > Analytics(a)lists.wikimedia.org
> >> >
https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >
> >>
> >>
> >>
> >> --
> >> Oliver Keyes
> >> Research Analyst
> >> Wikimedia Foundation
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics(a)lists.wikimedia.org
> >>
https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> >
> >
> > --
> > Joseph Allemandou
> > Data Engineer @ Wikimedia Foundation
> > IRC: joal
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics(a)lists.wikimedia.org
> >
https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org