On Mon, Sep 19, 2011 at 12:53 PM, Asher Feldman afeldman@wikimedia.orgwrote:
Since the primary use case here seems to be offline analysis and it may not be of much interest to mediawiki users outside of wmf, can we store the checksums in new tables (i.e. revision_sha1) instead of running large alters, and implement the code to generate checksums on new edits via an extension?
Checksums for most old revs can be generated offline and populated before the extension goes live. Since nothing will be using the new table yet, there'd be no issues with things like gap lock contention on the revision table from mass populating it.
That's probably the simplest solution; adding a new empty table will be very quick. It may make it slower to use the field though, depending on what all uses/exposes it.
During stub dump generation for instance this would need to add a left outer join on the other table, and add things to the dump output (and also needs an update to the XML schema for the dump format). This would then need to be preserved through subsequent dump passes as well.
-- brion