That's great and it will serve most of my use cases. Any chance we can get
that field added to the sampled logs & hourly counts?
On Wed, Jan 7, 2015 at 5:40 PM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
I am not sure if this is quite what you are asking but
just in case:
For streaming is probably easier for you to use the newly created
webrequest tables:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Webrequest_Table…
Those include an isPageview field so requests are pre-classified. You will
need to wait a bit as data for those tables is being populated starting
today.
On Wed, Jan 7, 2015 at 3:35 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org>
wrote:
Cool! Let's say I want to review the filters
and apply them in a python
script. What should I reference?
On Wed, Jan 7, 2015 at 5:13 PM, Oliver Keyes <okeyes(a)wikimedia.org>
wrote:
I'm pleased to say we now have the prototype
pageviews definition as a
UDF!
For those with cluster access:
CREATE TEMPORARY FUNCTION pageview as
'org.wikimedia.analytics.refinery.hive.isPageviewUDF';
...and then just apply it. It outputs a boolean, so you can easily go
WHERE is.Pageview(fields) and treat it as a conditional. Great
success!
What this means for the definition is twofold; it means it's a lot
easier to tests it accuracy, and it means that it's a lot easier to
make sure we're all using the same definition going forward. Once we
have the legacy definition as a UDF, refining and testing will proceed
at great speed, although I encourage anyone with time on their hands
who wants to help out to do some testing of their own :)
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics