On Mon, Sep 19, 2011 at 12:53 PM, Asher Feldman
<afeldman [at]
wikimedia>wrote:
> Since the primary use case here seems to be offline analysis and it may
not
> be of much interest to mediawiki users outside of
wmf, can we store the
> checksums in new tables (i.e. revision_sha1) instead of running large
> alters, and implement the code to generate checksums on new edits via an
> extension?
>
> Checksums for most old revs can be generated offline and populated before
> the extension goes live. Since nothing will be
using the new table yet,
> there'd be no issues with things like gap lock contention on the revision
table from
mass populating it.
That's probably the simplest solution; adding a new empty table will be
very
quick. It may make it slower to use the field though,
depending on what
all
uses/exposes it.
During stub dump generation for instance this would need to add a left
outer
join on the other table, and add things to the dump
output (and also needs
an update to the XML schema for the dump format). This
would then need to
be
preserved through subsequent dump passes as well.
-- brion
Can we resist the temptation to implement schema changes as new tables
purely to make life easier for Wikimedia? Core schema changes are certainly
enough of a hurdle to warrant serious discussion, but they are not the
totally-intractable mess that they used to be. 1.19 already includes index
changes to the user and logging tables; it will already require the full
game of musical chairs with the db slaves. Implementing this as a new
column does not actually make things any more complicated, it would just
mean that an operation that would take three hours before might now take
five.
It may or may not be an architecturally-better design to have it as a
separate table, although considering how rapidly MW's 'architecture' changes
I'd say keeping things as simple as possible is probably a virtue. But that
is the basis on which we should be deciding it. This is a big project which
still retains its enthusiasm because we recognise that it has equally big
potential to provide interesting new features far beyond the immediate
usecases we can construct now (dump validation and 'something to do with
reversions'). Let's not hamstring it at birth based on the operational
pressures of the one MediaWiki end user who is best placed to overcome said
issues.
--HM