On Sun, Sep 18, 2011 at 2:33 AM, Ariel T. Glenn <ariel(a)wikimedia.org> wrote:
Στις 17-09-2011, ημέρα Σαβ, και ώρα 22:55 -0700, ο/η
Robert Rohde
έγραψε:
On Sat, Sep 17, 2011 at 4:56 PM, Anthony
<wikimail(a)inbox.org> wrote:
<snip>
For
offline analyses, there's no need to change the online database tables.
Need? That's debatable, but one of the major motivators is the desire
to have hash values in database dumps (both for revert checks and for
checksums on correct data import / export). Both of those are
"offline" uses, but it is beneficial to have that information
precomputed and stored rather than frequently regenerated.
If we don't have it in the online database tables, this defeats the
purpose of having the value in there at all, for the purpose of
generating the XML dumps.
Recall that the dumps are generated in two passes; in the first pass we
retrieve from the db and record all of the metadata about revisions, and
in the second (time-comsuming) pass we re-use the text of the revisions
from a previous dump file if the text is in there. We want to compare
the has of that text against what the online database says the hash is;
if they don't match, we want to fetch the live copy.
Well, this is exactly the type of use in which collisions do matter.
Do you really want the dump to not record the correct data when some
miscreant creates an intentional collision?