Hi all,
the webrequest and pageview_hourly tables on Hive contain the very useful user_agent_map field, which stores the following data extracted from the raw user agent (still available as a separate field): device_family, browser_family, browser_major, os_family, os_major, os_minor and wmf_app_version. (The Analytics Engineering team has built a dashboard that uses this data and last month published a popular blog post about it.) I understand it is mainly based on the ua-parser library (http://www.uaparser.org/ ) .
In contrast, the event capsule in our EventLogging tables only contains the raw, unparsed user agent.
* Does anyone on this list have experience in parsing user agents in EventLogging data for the purpose of detecting browser family, version etc, and would like to share advice on how to do this most efficiently? (In the past, I have written some expressions in MySQL to extract the app version number for the Wikipedia apps. But it seems a bit of a pain to do that for classifying browsers in general. One option would be to export the data and use the Python version of ua-parser, however doing it directly in MySQL would fit better into existing workflows.)
* Assuming it is technically possible to add such a pre-parsed user_agent_map field to the EventLogging tables, would other analysts be interested in using it too?
This came up recently with the Reading web team, for the purpose of investigating whether certain issues are caused by certain browsers only. But I imagine it has arisen in other places as well.