Hey Analytics!

I'm working on updating the Wikitech Analytics documentation based on my new understanding of the Data Lake. I've already clarified that there's no separate thing called the "Data Warehouse" (other than some experiments from 2015), but I still don't understand the difference between the Analytics Cluster and the Data Lake.

From what I learned yesterday, the Data Lake is everything stored in the Hadoop cluster (including pageview, mediacounts, last-access, and edit history data), even when it can't be usefully joined together.

But that seems to be the same thing as the Analytics Cluster ("the Hadoop cluster and its related components"). Is it possible to pick one name ("Data Lake" or "Analytics Cluster") and stick with it? I promise you it'll make the whole system much easier to understand for outsiders :)

--
Neil Patel Quinn, product analyst
Wikimedia Foundation