It's not just a question of which value to
choose, but also how to sort.
It would be nice to be able to choose sorting in alphabetical order vs
numerical order. It would also be nice to assign a default sort to any item
label that is taken from the Wikipedia {{DEFAULTSORT}} template (though
that won't work for items without a Wikipedia article).
On Thu, Sep 15, 2016 at 10:18 AM, Dan Andreescu <dandreescu(a)wikimedia.org
The problem with working on EL data in hive is
that the schemas for the
tables can change at any point, in backwards-incompatible ways. And
maintaining tables dynamically is harder here than in mysql world (where EL
just tries to insert, and creates the table on failure). So, while it's
relatively easy to use ua-parser (see below), you can't easily access EL
data in hive tables. However, we do have all EL data in hadoop, so you can
access it with Spark. Andrew's about to answer with more details on that.
I just thought this might be useful if you sqoop EL data from mysql or
otherwise import it into a Hive table:
from stat1002, start hive, then:
ADD JAR /srv/deployment/analytics/refinery/artifacts/org/wikimedia/a
nalytics/refinery/refinery-hive-0.0.35.jar;
CREATE TEMPORARY FUNCTION ua_parser as 'org.wikimedia.analytics.refin
ery.hive.UAParserUDF';
select ua_parser('Wikimedia Bot');
On Thu, Sep 15, 2016 at 1:06 AM, Federico Leva (Nemo) <
nemowiki(a)gmail.com> wrote:
Tilman Bayer, 15/09/2016 01:21:
> This came up recently with the Reading web team, for the purpose of
> investigating whether certain issues are caused by certain browsers
> only. But I imagine it has arisen in other places as well.
>
Definitely.
https://www.mediawiki.org/wiki
/EventLogging/UserAgentSanitization
Nemo
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org