Hi everyone,

We are now supporting Hive tables with EventLogging data!

This has been a long project. We finally feel comfortable enough to announce support for this method of querying EventLogging data.  You can read documentation on how to access this data here:

https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Hadoop_&_Hive


The ‘event’ database in Hive now contains tables for most EventLogging schemas, including both ‘analytics’ schemas and some of the ‘EventBus’ schemas.

Since Hive is a strongly typed system, there are limitations to what data can be imported from JSON.  The job that imports this data does a bit of magic to infer the field types from the data itself.  If your data is ever produced with a field that has multiple types (e.g. string vs object, integer vs. float, etc.), the import for the whole hour that the discrepancy is in will fail.  Please be careful when you design your schemas and with your code that emits the events.  We’ve recently been putting together some draft guidelines for new schemas.  Keep these in mind when you design new schemas :)


This is a new thing and it could be buggy!  Please let us know if you encounter problems while using this data.

- Your friendly Analytics engineering team