On 11-09-19 12:57 PM, Brion Vibber wrote:
On Mon, Sep 19, 2011 at 12:53 PM, Asher Feldman afeldman@wikimedia.orgwrote:
Since the primary use case here seems to be offline analysis and it may not be of much interest to mediawiki users outside of wmf, can we store the checksums in new tables (i.e. revision_sha1) instead of running large alters, and implement the code to generate checksums on new edits via an extension?
Checksums for most old revs can be generated offline and populated before the extension goes live. Since nothing will be using the new table yet, there'd be no issues with things like gap lock contention on the revision table from mass populating it.
That's probably the simplest solution; adding a new empty table will be very quick. It may make it slower to use the field though, depending on what all uses/exposes it.
During stub dump generation for instance this would need to add a left outer join on the other table, and add things to the dump output (and also needs an update to the XML schema for the dump format). This would then need to be preserved through subsequent dump passes as well.
-- brion
Revision is going to need to either make a JOIN whenever it grabs revision info, or make an additional db query whenever someone does use the checksum.
Btw, instead of having Revision return a checksum string and needing to check what type it is with a second method (best to program generically in case we do switch checksum types) how about we return an instance of a simple SHA1 wrapper class. We can have a MD5 one too and use a simple descriptive method instead of having to manually use wfBaseConvert with the right args when you want something good for filesystem use.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]