Hi everyone,
As we do more frequent deploys, it's going to become critical that we get database schema changes correct, and that we do so in a way that gives us time to prepare for said changes and roll back to old versions of the software should a deploy go poorly. This applies both to MediaWiki core and to WMF-deployed extensions.
I'd like to propose that we make the following standard practice:
1. All schema changes must go through a period of being optional. For example, instead of changing the format of a column, create a new column, make all writes happen to the old and new column (if it exists) and deprecate use of the old column. Check if the new column exists before blindly assuming that it does. Only eliminate support for the old column after it's clear the schema migration has happened and there's no chance that we'll need to roll back to the old version of the software.
2. There might be cases where rule #1 will be prohibitive from a performance perspective. However, schema changes like that should be rare to begin with, and should have prominent discussion on this list. In the case where it's impossible to follow rule #1, it is still critical to write scripts to roll back to the pre-change state.
3. For anything that involves a schema change to the production dbs, make sure Asher Feldman (afeldman@wikimedia.org) is on the reviewer list. He's already keeping an eye on this stuff the best he can, but it's going to be easy for him to miss changes in extensions should they happen.
I don't have a strong opinion about whether we need to follow rule #1 above through an iteration of our six month tarball release cycle, but we at least need to follow it through the two week deployment cycle.
Assuming this seems sensible to everyone, I can update this page with this: http://www.mediawiki.org/wiki/Development_policy
(/me desperately tries to avoid yak shaving and updating the policy above for Git)
Rob
On Tue, Apr 24, 2012 at 5:52 PM, Rob Lanphier robla@wikimedia.org wrote:
I don't have a strong opinion about whether we need to follow rule #1 above through an iteration of our six month tarball release cycle, but we at least need to follow it through the two week deployment cycle.
I think rule #1 is unnecessary for tarballs, but it's definitely a good thing for deployments.
Other than that this all makes perfect sense to me.
Roan
On Tue, Apr 24, 2012 at 05:52:24PM -0700, Rob Lanphier wrote:
As we do more frequent deploys, it's going to become critical that we get database schema changes correct, and that we do so in a way that gives us time to prepare for said changes and roll back to old versions of the software should a deploy go poorly. This applies both to MediaWiki core and to WMF-deployed extensions.
I'd like to propose that we make the following standard practice:
I'm still new around here so pardon me if this sounds infeasible for us:
In other systems I've worked before, such problems have been solved by each schema-breaking version providing schema *and data* migrations for both forward *and backward* steps.
This means that the upgrade transition mechanism knew how to add or remove columns or tables *and* how to fill them with data (say by concatenating two columns of the old schema). The same program would also take care to do the exact opposite steps in a the migration's backward method, in case a rollback was needed.
The migrations themselves can be kept in the source tree, perhaps even versioned and with the schema version kept in the database, so that both us and external users can at any time forward their database to any later version, automagically.
I think that both Ruby on Rails and Python/Django (with South) employ such schemas and I've seen them work well in practice before.
Regards, Faidon
I am generally in favor of all of this and in the meeting that proceeded Rob's email, proposed that we develop a new schema migration tool for mediawiki along similar lines. Such a beast would have to work in all deployment cases without modifications (stock single wiki installs and at wmf with many wikis across multiple masters with tiered replication), be idempotent when run across many databases, track version and state per migration, and include up/down steps in every migration.
There are opensource php migration tools modeled along those used by the popular ruby and python frameworks. I deployed https://github.com/davejkiger/mysql-php-migrations at kiva.org a couple years ago where it worked well and is still in use. Nothing will meet our needs off the shelf though. A good project could at best be forked into mediawiki with modifications if the license allows it, or more likely serve as a model for our own development.
On Tue, Apr 24, 2012 at 11:27 PM, Faidon Liambotis faidon@wikimedia.orgwrote:
In other systems I've worked before, such problems have been solved by each schema-breaking version providing schema *and data* migrations for both forward *and backward* steps.
This means that the upgrade transition mechanism knew how to add or remove columns or tables *and* how to fill them with data (say by concatenating two columns of the old schema). The same program would also take care to do the exact opposite steps in a the migration's backward method, in case a rollback was needed.
Down migrations aid development; I find them most useful as documentation of prior state, making a migration readable as a diff. They generally aren't useful in production environments at scale though, which developers removed from the workings of production need to be aware of. Even with transparent execution of migrations, the time it takes to apply changes will nearly always be far outside of the acceptable bounds of an emergency response necessitating a code rollback. So except in obvious cases such as adding new tables, care is needed to keep forward migration backwards compatible with code as much as possible.
The migrations themselves can be kept in the source tree, perhaps even
versioned and with the schema version kept in the database, so that both us and external users can at any time forward their database to any later version, automagically.
Yep. That we have to pull in migrations from both core and many extensions (many projects, one migration system) while also running different sets of extensions across different wikis intermingling on the same database servers adds some complexity but we should get there.
-Asher
I once wrote a pretty decent schema migration tool that fits most if not all of these requirements. It was built for the Kohana PHP framework, but a lot of it is pretty independent of that. If someone ends up working on this I'd love to help and maybe share some code and ideas.
-Andrew Otto
http://ottomata.org http://www.flickr.com/photos/OttomatonA http://www.couchsurfing.org/people/otto
On Apr 25, 2012, at 12:58 PM, Asher Feldman wrote:
I am generally in favor of all of this and in the meeting that proceeded Rob's email, proposed that we develop a new schema migration tool for mediawiki along similar lines. Such a beast would have to work in all deployment cases without modifications (stock single wiki installs and at wmf with many wikis across multiple masters with tiered replication), be idempotent when run across many databases, track version and state per migration, and include up/down steps in every migration.
There are opensource php migration tools modeled along those used by the popular ruby and python frameworks. I deployed https://github.com/davejkiger/mysql-php-migrations at kiva.org a couple years ago where it worked well and is still in use. Nothing will meet our needs off the shelf though. A good project could at best be forked into mediawiki with modifications if the license allows it, or more likely serve as a model for our own development.
On Tue, Apr 24, 2012 at 11:27 PM, Faidon Liambotis faidon@wikimedia.orgwrote:
In other systems I've worked before, such problems have been solved by each schema-breaking version providing schema *and data* migrations for both forward *and backward* steps.
This means that the upgrade transition mechanism knew how to add or remove columns or tables *and* how to fill them with data (say by concatenating two columns of the old schema). The same program would also take care to do the exact opposite steps in a the migration's backward method, in case a rollback was needed.
Down migrations aid development; I find them most useful as documentation of prior state, making a migration readable as a diff. They generally aren't useful in production environments at scale though, which developers removed from the workings of production need to be aware of. Even with transparent execution of migrations, the time it takes to apply changes will nearly always be far outside of the acceptable bounds of an emergency response necessitating a code rollback. So except in obvious cases such as adding new tables, care is needed to keep forward migration backwards compatible with code as much as possible.
The migrations themselves can be kept in the source tree, perhaps even
versioned and with the schema version kept in the database, so that both us and external users can at any time forward their database to any later version, automagically.
Yep. That we have to pull in migrations from both core and many extensions (many projects, one migration system) while also running different sets of extensions across different wikis intermingling on the same database servers adds some complexity but we should get there.
-Asher _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Here's the migrations library I wrote. :) https://github.com/ottomata/cs_migrations
-Andrew Otto
On Apr 26, 2012, at 11:30 AM, Andrew Otto wrote:
I once wrote a pretty decent schema migration tool that fits most if not all of these requirements. It was built for the Kohana PHP framework, but a lot of it is pretty independent of that. If someone ends up working on this I'd love to help and maybe share some code and ideas.
-Andrew Otto
http://ottomata.org http://www.flickr.com/photos/OttomatonA http://www.couchsurfing.org/people/otto
On Apr 25, 2012, at 12:58 PM, Asher Feldman wrote:
I am generally in favor of all of this and in the meeting that proceeded Rob's email, proposed that we develop a new schema migration tool for mediawiki along similar lines. Such a beast would have to work in all deployment cases without modifications (stock single wiki installs and at wmf with many wikis across multiple masters with tiered replication), be idempotent when run across many databases, track version and state per migration, and include up/down steps in every migration.
There are opensource php migration tools modeled along those used by the popular ruby and python frameworks. I deployed https://github.com/davejkiger/mysql-php-migrations at kiva.org a couple years ago where it worked well and is still in use. Nothing will meet our needs off the shelf though. A good project could at best be forked into mediawiki with modifications if the license allows it, or more likely serve as a model for our own development.
On Tue, Apr 24, 2012 at 11:27 PM, Faidon Liambotis faidon@wikimedia.orgwrote:
In other systems I've worked before, such problems have been solved by each schema-breaking version providing schema *and data* migrations for both forward *and backward* steps.
This means that the upgrade transition mechanism knew how to add or remove columns or tables *and* how to fill them with data (say by concatenating two columns of the old schema). The same program would also take care to do the exact opposite steps in a the migration's backward method, in case a rollback was needed.
Down migrations aid development; I find them most useful as documentation of prior state, making a migration readable as a diff. They generally aren't useful in production environments at scale though, which developers removed from the workings of production need to be aware of. Even with transparent execution of migrations, the time it takes to apply changes will nearly always be far outside of the acceptable bounds of an emergency response necessitating a code rollback. So except in obvious cases such as adding new tables, care is needed to keep forward migration backwards compatible with code as much as possible.
The migrations themselves can be kept in the source tree, perhaps even
versioned and with the schema version kept in the database, so that both us and external users can at any time forward their database to any later version, automagically.
Yep. That we have to pull in migrations from both core and many extensions (many projects, one migration system) while also running different sets of extensions across different wikis intermingling on the same database servers adds some complexity but we should get there.
-Asher _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Le 25/04/12 02:52, Rob Lanphier a écrit :
- For anything that involves a schema change to the production dbs,
make sure Asher Feldman (afeldman@wikimedia.org) is on the reviewer list. He's already keeping an eye on this stuff the best he can, but it's going to be easy for him to miss changes in extensions should they happen.
I am pretty sure Jenkins could detect a change is being made on a .sql file and then add a specific reviewer using Gerrit CLI tool.
Logged as: https://bugzilla.wikimedia.org/36228
Thanks, hashar!
On Wed, Apr 25, 2012 at 12:12 AM, Antoine Musso hashar+wmf@free.fr wrote:
Le 25/04/12 02:52, Rob Lanphier a écrit :
- For anything that involves a schema change to the production dbs,
make sure Asher Feldman (afeldman@wikimedia.org) is on the reviewer list. He's already keeping an eye on this stuff the best he can, but it's going to be easy for him to miss changes in extensions should they happen.
I am pretty sure Jenkins could detect a change is being made on a .sql file and then add a specific reviewer using Gerrit CLI tool.
Logged as: https://bugzilla.wikimedia.org/36228
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Apr 24, 2012 at 5:52 PM, Rob Lanphier robla@wikimedia.org wrote:
Assuming this seems sensible to everyone, I can update this page with this: http://www.mediawiki.org/wiki/Development_policy
And this is done now.
In case you aren't using a threaded mail client, here's the original discussion: http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/60967
Rob
Hi all,
I'd like to remind everyone involved in development that requires db schema migrations - please keep in mind the three related guidelines in our official deployment policies - http://www.mediawiki.org/wiki/Development_policy#Database_patches - especially the third, which is to make schema changes optional.
Once a migration has been reviewed, please update http://wikitech.wikimedia.org/view/Schema_changes with all pertinent details, then get in touch for deployment scheduling. There are good and legitimate reasons to not follow the "make schema changes optional" policy but if that's the case, please provide 3-7 days of lead time, depending on the size of tables and number of effected wikis.
Best, Asher
On Mon, May 14, 2012 at 7:25 PM, Rob Lanphier robla@wikimedia.org wrote:
On Tue, Apr 24, 2012 at 5:52 PM, Rob Lanphier robla@wikimedia.org wrote:
Assuming this seems sensible to everyone, I can update this page with
this:
And this is done now.
In case you aren't using a threaded mail client, here's the original discussion: http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/60967
Rob
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org