On 11-09-19 12:57 PM, Brion Vibber wrote:
On Mon, Sep 19, 2011 at 12:53 PM, Asher Feldman
<afeldman(a)wikimedia.org>wrote;wrote:
Since the primary use case here seems to be
offline analysis and it may not
be of much interest to mediawiki users outside of wmf, can we store the
checksums in new tables (i.e. revision_sha1) instead of running large
alters, and implement the code to generate checksums on new edits via an
extension?
Checksums for most old revs can be generated offline and populated before
the extension goes live. Since nothing will be using the new table yet,
there'd be no issues with things like gap lock contention on the revision
table from mass populating it.
That's probably the simplest solution; adding a new empty table will be very
quick. It may make it slower to use the field though, depending on what all
uses/exposes it.
During stub dump generation for instance this would need to add a left outer
join on the other table, and add things to the dump output (and also needs
an update to the XML schema for the dump format). This would then need to be
preserved through subsequent dump passes as well.
-- brion
Revision is going to need to either make a JOIN whenever it grabs
revision info, or make an additional db query whenever someone does use
the checksum.
Btw, instead of having Revision return a checksum string and needing to
check what type it is with a second method (best to program generically
in case we do switch checksum types) how about we return an instance of
a simple SHA1 wrapper class. We can have a MD5 one too and use a simple
descriptive method instead of having to manually use wfBaseConvert with
the right args when you want something good for filesystem use.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://daniel.friesen.name]