Hi Sean,
I am very excited about this. Thank you. :-) Re unified views:
On Wed, Apr 30, 2014 at 6:59 AM, Dan Andreescu dandreescu@wikimedia.orgwrote:
This is awesome, thank you Sean
*This is probably my bad, but I understood the goal to be having a
single db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?
No, that wasn't the goal. Sorry if there was miscommunication. The actual data will remain in separate wikis using regular replication.
However, it's quite possible to create one or more unified databases with (for example) SQL VIEWs that union all tables from a set of pre-defined wikis, with 'wiki' columns, just as you describe. Same thing, really. We could even allow ad-hoc creation of unified views for whatever .dblist is appropriate for the project. I don't think anything need be ruled out yet -- that's the whole point of SQL, right? Slow, but flexible. :-)
that would work, Oliver is right that creating views for core tables in pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about we create a page on wikitech with requirements for these views and we take it from there?
Union-ified views sound great here. Let's see how they perform. I bet they'll be fine but if they're not, maybe we can throw them into Hadoop? Using the views to do the MySQL -> Hadoop replication would be so much easier than going to each database individually.
Like Oliver, I also thought we will have everything in one database. I guess Oliver and I talk a lot to each other. ;-)
It will be great to have unified views.
Thanks, Leila p.s. Oliver, you are not hallucinating, as Dario confirmed, too.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics