Hey Analytics!
I'm working on updating the
Wikitech Analytics documentation based on my new understanding of the Data Lake. I've already clarified that there's no separate thing called the "Data Warehouse" (other than some experiments from 2015), but I still don't understand the difference between the
Analytics Cluster and the
Data Lake.
From what I learned yesterday, the Data Lake is everything stored in the Hadoop cluster (including pageview, mediacounts, last-access, and edit history data), even when it can't be usefully joined together.