Okay, so, have tested (to a limited degree. The work I'm doing that involves the dbs involves eventlogging, so this is mostly me making up excuses to run queries). Thoughts:*We should probably put in some kind of restrictions around what we care about. For example, I see the tables relating to the WIkimania and Arbcom wikis in there. This is not data I think we're ever going to care about, but it is data, which means we'll either have to write really complex UNIONs to gather global data, with a constantly-maintained list of dbs-we-don't-care-about, or accept inaccuracies in our data. My suggestion would be for these dbs to be removed and excluded from replication, using the noc dblists to identify the ones we don't care about; generally "deleted","closed","special","wikimedia" wikis aren't things we want to be running queries over.
*This is probably my bad, but I understood the goal to be having a single db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?