I tried to move Zero analytics to the new table, and decided to test the new wonderful fields like agent_type ... and it only works on the most recent hours of data ((

https://phabricator.wikimedia.org/T95806

On Fri, Apr 10, 2015 at 8:51 PM, Yuri Astrakhan <yastrakhan@wikimedia.org> wrote:
Please clarify why the field "is_zero" is needed, as it is nothing more than a test for ("zero=" in x_analytics). Does having this field significantly improve performance for zero queries, e.g. "select count(*) from requests where iszero = true" ? Because otherwise it simply identifies "zero partner" traffic, not "was that request actually zero rated or not".

Thanks!

On Fri, Apr 10, 2015 at 5:16 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Cool!

On 10 April 2015 at 17:12, Joseph Allemandou <jallemandou@wikimedia.org> wrote:
> Yes Oliver, the agent_type = spider includes IsCrawler UDF.
>
> On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>
>> What does agent-type add? In the sense that if we're pre-parsing the
>> user agent, surely the difference is between "WHERE agent_type !=
>> 'spider'" and "WHERE user_agent_map['device_family'] != 'Spider'"?
>> Does agent_type include the isCrawler UDF results?
>>
>> On 10 April 2015 at 16:47, Joseph Allemandou <jallemandou@wikimedia.org>
>> wrote:
>> > And I forgot one field :
>> >
>> > is_zero - True if a request is made on a zero provider.
>> >
>> >
>> > On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia <leila@wikimedia.org> wrote:
>> >>
>> >> Hi Joseph,
>> >>
>> >>    Thanks for the update, and for doing this. These three items make
>> >> the
>> >> analysis of the data much easier on our end. We've had many requests in
>> >> the
>> >> past that required agent_type and access_method information and having
>> >> them
>> >> readily available is awesome! :-)
>> >>
>> >> Have a great weekend!
>> >>
>> >> Leila
>> >>
>> >> On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou
>> >> <jallemandou@wikimedia.org> wrote:
>> >>>
>> >>> Hi Analytics people,
>> >>>
>> >>> Today happens another bunch of addition to the refined webrequest
>> >>> table
>> >>> in hive.
>> >>> Now the table contains:
>> >>>
>> >>> ts - The unix timestamp (milliseconds) version of the dt date
>> >>> access_method - The method used to access the site, being one of the
>> >>> three [mobile app | mobile web | desktop]
>> >>> agent_type - To differentiate easily between spiders and users (more
>> >>> values may be added later).
>> >>>
>> >>> These additions are based on the "tags", as defined here:
>> >>> https://meta.wikimedia.org/wiki/Research:Page_view
>> >>>
>> >>> Have a good weekend !
>> >>>
>> >>> --
>> >>> Joseph Allemandou
>> >>> Data Engineer @ Wikimedia Foundation
>> >>> IRC: joal
>> >>>
>> >>> _______________________________________________
>> >>> Analytics mailing list
>> >>> Analytics@lists.wikimedia.org
>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>
>> >>
>> >>
>> >> _______________________________________________
>> >> Analytics mailing list
>> >> Analytics@lists.wikimedia.org
>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>
>> >
>> >
>> >
>> > --
>> > Joseph Allemandou
>> > Data Engineer @ Wikimedia Foundation
>> > IRC: joal
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> --
> Joseph Allemandou
> Data Engineer @ Wikimedia Foundation
> IRC: joal
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics