*This is probably my bad, but I understood the goal to be having a single db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?Okay, so, have tested (to a limited degree. The work I'm doing that involves the dbs involves eventlogging, so this is mostly me making up excuses to run queries). Thoughts:*We should probably put in some kind of restrictions around what we care about. For example, I see the tables relating to the WIkimania and Arbcom wikis in there. This is not data I think we're ever going to care about, but it is data, which means we'll either have to write really complex UNIONs to gather global data, with a constantly-maintained list of dbs-we-don't-care-about, or accept inaccuracies in our data. My suggestion would be for these dbs to be removed and excluded from replication, using the noc dblists to identify the ones we don't care about; generally "deleted","closed","special","wikimedia" wikis aren't things we want to be running queries over.
On 29 April 2014 18:07, Oliver Keyes <okeyes@wikimedia.org> wrote:
One word: YAY!
Thank you so much for this, Sean :DOn 29 April 2014 17:13, Sean Pringle <springle@wikimedia.org> wrote:
On Wed, Apr 30, 2014 at 6:01 AM, Dario Taraborelli <dtaraborelli@wikimedia.org> wrote:
Sean, consolation prizes are understated, this is terrific.I just noticed that centralauth is not included, after EventLogging data this is the most useful database to have replicated on the big one box.Good point. I had not granted access to centralauth for the 'research' user. Should work now.
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--Oliver Keyes
Research Analyst
Wikimedia Foundation
--Oliver Keyes
Research Analyst
Wikimedia Foundation