On Thu, Feb 21, 2008 at 6:38 PM, Simetrical Simetrical+wikilist@gmail.com wrote:
On Thu, Feb 21, 2008 at 12:56 PM, Alex Powell alexp@exscien.com wrote:
Yes live updates could go to both tables, or you can just do batch updates in the scripts that catch up with the live table. The main idea is not to lock the table to writes - I presumed that it would affect the site, as how do edits proceed whilst the table is locked??
The magic of replication. :) The one doing the update goes through each slave: taking it out of rotation, applying the change, re-adding it to rotation, and let it catch up to the master, then repeating with the next slave. Finally the master is switched, so that a former slave (with updated schema) is the new master, and the process is repeated one last time for the old master. The alter still locks the tables, but it does so while the slave is offline, so it doesn't block anything, and there are enough slaves that the site doesn't suffer too much from being short one for a while.
Of course, this depends on the schemas being similar enough that replicated statements still can be executed on the new database. But since, helpfully, Wikimedia develops its own software, it can ensure that that's always the case. It's mostly a case of ordering everything right: add any new fields, then update software so it no longer needs old fields, then remove old fields.
Clever. Forgot MySQL does replication on SQL statements. I imagine if you tried something like that on MS SQL the replication would break horribly, since it does binary replication. I guess in fact you are doing my suggestion, just at the column level, rather than table. Makes me want to go and set up a slave server ;)
Well to get back to the topic I think a master list of categories would be a good thing. I have an AJAX category suggester that does a SELECT DISTINCT WHERE LIKE, LIMIT, and I suspect that with as many categories as the wikipedia has it would prove to be very inefficient. A master list of categories would be a much easier way of getting it.