Please clarify why the field "is_zero" is
needed, as it is nothing more
than a test for ("zero=" in x_analytics). Does having this field
significantly improve performance for zero queries, e.g. "select count(*)
from requests where iszero = true" ? Because otherwise it simply identifies
"zero partner" traffic, not "was that request actually zero rated or
not".
Thanks!
On Fri, Apr 10, 2015 at 5:16 PM, Oliver Keyes <okeyes(a)wikimedia.org>
wrote:
Cool!
On 10 April 2015 at 17:12, Joseph Allemandou <jallemandou(a)wikimedia.org>
wrote:
Yes Oliver, the agent_type = spider includes
IsCrawler UDF.
On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes <okeyes(a)wikimedia.org>
wrote:
>
> What does agent-type add? In the sense that if we're pre-parsing the
> user agent, surely the difference is between "WHERE agent_type !=
> 'spider'" and "WHERE user_agent_map['device_family'] !=
'Spider'"?
> Does agent_type include the isCrawler UDF results?
>
> On 10 April 2015 at 16:47, Joseph Allemandou <
jallemandou(a)wikimedia.org>
> wrote:
> > And I forgot one field :
> >
> > is_zero - True if a request is made on a zero provider.
> >
> >
> > On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia <leila(a)wikimedia.org>
wrote:
> >>
> >> Hi Joseph,
> >>
> >> Thanks for the update, and for doing this. These three items make
> >> the
> >> analysis of the data much easier on our end. We've had many
requests in
> >> the
> >> past that required agent_type and access_method information and
having
> >> them
> >> readily available is awesome! :-)
> >>
> >> Have a great weekend!
> >>
> >> Leila
> >>
> >> On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou
> >> <jallemandou(a)wikimedia.org> wrote:
> >>>
> >>> Hi Analytics people,
> >>>
> >>> Today happens another bunch of addition to the refined webrequest
> >>> table
> >>> in hive.
> >>> Now the table contains:
> >>>
> >>> ts - The unix timestamp (milliseconds) version of the dt date
> >>> access_method - The method used to access the site, being one of
the
> >>> three [mobile app | mobile web |
desktop]
> >>> agent_type - To differentiate easily between spiders and users
(more
>>> values may be added later).
>>>
>>> These additions are based on the "tags", as defined here:
>>>
https://meta.wikimedia.org/wiki/Research:Page_view
>>>
>>> Have a good weekend !
>>>
>>> --
>>> Joseph Allemandou
>>> Data Engineer @ Wikimedia Foundation
>>> IRC: joal
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
>
> --
> Joseph Allemandou
> Data Engineer @ Wikimedia Foundation
> IRC: joal
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Joseph Allemandou
Data Engineer @ Wikimedia Foundation
IRC: joal
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics