On 30 April 2014 06:59, Dan Andreescu
<dandreescu(a)wikimedia.org> wrote:
This is awesome, thank you Sean
> *This
is probably my bad, but I understood the goal to be having a single db containing unified,
core tablets. So, we'd have one db, with one revision table, that'd have an extra
column of "wiki" that denoted the project the entry referred to. This would let
us perform global queries without the complex UNIONs mentioned above. Is this still the
goal, or...?
No, that wasn't the goal. Sorry if there was miscommunication. The actual data will
remain in separate wikis using regular replication.
However, it's quite possible to create one or more unified databases with (for
example) SQL VIEWs that union all tables from a set of pre-defined wikis, with
'wiki' columns, just as you describe. Same thing, really. We could even allow
ad-hoc creation of unified views for whatever .dblist is appropriate for the project. I
don't think anything need be ruled out yet -- that's the whole point of SQL,
right? Slow, but flexible. :-)
that would work, Oliver is right that creating views for core tables in pre-defined wikis
(say, all wikipedias) would be valuable. Sean, how about we create a page on wikitech with
requirements for these views and we take it from there?
Union-ified views sound great here. Let's see how they perform. I bet they'll
be fine but if they're not, maybe we can throw them into Hadoop? Using the views to
do the MySQL -> Hadoop replication would be so much easier than going to each database
individually.