*This is probably my bad, but I understood the
goal to be having a
single db containing unified, core tablets. So,
we'd have one db, with one
revision table, that'd have an extra column of "wiki" that denoted the
project the entry referred to. This would let us perform global queries
without the complex UNIONs mentioned above. Is this still the goal, or...?
No, that wasn't the goal. Sorry if there was miscommunication. The
actual data will remain in separate wikis using regular replication.
However, it's quite possible to create one or more unified databases
with (for example) SQL VIEWs that union all tables from a set of
pre-defined wikis, with 'wiki' columns, just as you describe. Same thing,
really. We could even allow ad-hoc creation of unified views for whatever
.dblist is appropriate for the project. I don't think anything need be
ruled out yet -- that's the whole point of SQL, right? Slow, but flexible.
:-)
that would work, Oliver is right that creating views for core tables in
pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about
we create a page on wikitech with requirements for these views and we take
it from there?
Union-ified views sound great here. Let's see how they perform. I bet
they'll be fine but if they're not, maybe we can throw them into Hadoop?
Using the views to do the MySQL -> Hadoop replication would be so much
easier than going to each database individually.
Totally down for that, but...