Hi Analytics Fellows,
*TL;DR:*
Yesterday we broke and fixed hive wmf.webrequest table.
Jobs not monitored by the Analytics team might have failed - Check your
logs :)
*Long story:*
Yesterday at 9am UTC we deployed a change to the hive wmf.webrequest table
that broke some of its functionality. More precisely, queries to the table
that needed to read parquet columns in detail would fail with a hive
internal error.
The problem had gone unnoticed for a few hours since most of our complex
computation jobs run only at night, and we only got aware of it after some
hours (~18am UTC, kudos @bearloga!).
We quickly fixed the issue and restarted the needed jobs over the
problematic period.
Given the type of failure of the jobs with the problem, we are sure that
there have been no data corruption: jobs would fail even before starting to
try to compute anything. For production jobs we monitor, we know which jobs
have failed and we've taken care of it, however for jobs that are not
monitored (report-updater, manual scripts etc), some silent failures might
have occurred. Please check your logs :)
Cheers
--
*Joseph Allemandou*
Data Engineer @ Wikimedia Foundation
IRC: joal