On 22 March 2018 at 13:41, Neil Patel Quinn <nquinn(a)wikimedia.org> wrote:
Both the edit data and pageview data that you're talking about come from
the Hadoop-based Analytics Data Lake
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake>. However,
because of limitations in the underlying MediaWiki application databases
<https://www.mediawiki.org/wiki/Manual:Database_layout> *that Hive pulls
edit data from*, the data requires some complex reconstruction and
denormalization
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Data_Lake/Edits/Pipeline>
that takes several days to a week.
Sorry, I garbled that a little. It's more correct to say: "because of
limitations in the underlying MediaWiki application databases *that are the
source of the edit data*, the data requires..."