On Sun, Sep 18, 2011 at 1:55 AM, Robert Rohde <rarohde(a)gmail.com> wrote:
If collision attacks really matter we should use
If collision attacks really matter you should use, at least, SHA-256, no?
any of the proposed use cases care about whether someone might
intentionally inject a collision? In the proposed uses I've looked at
it, it seems irrelevant. The intentional collision will get flagged
as a revert and the text leading to that collision would be discarded.
How is that a bad thing?
Well, what if the checksum of the initial page hasn't been calculated
yet? Then some miscreant sets the page to spam which collides, and
then the spam gets reverted. The good page would be the one that gets
Maybe that's not feasible. Maybe it is. Either way, I'd feel very
uncomfortable about the fact that someday someone might decide to use
the checksums in some way in which collisions would matter.
Now I don't know how important the CPU differences in calculating the
two versions would be. If they're significant enough, then fine, use
MD5, but make sure there are warnings all over the place about its
(As another possibility, what if someone writes a bot to detect
certain reverts? I can see spammers/vandals having a field day with
this sort of thing.)
analyses, there's no need to change the online database tables.
Need? That's debatable, but one of the major motivators is the desire
to have hash values in database dumps (both for revert checks and for
checksums on correct data import / export). Both of those are
"offline" uses, but it is beneficial to have that information
precomputed and stored rather than frequently regenerated.
Why not in a separate file? There's no need to get permission from
anyone or mess with the schema to generate a file with revision ids
and checksums. If WMF won't host it at the regular dump location
(which I can't see why they wouldn't), you could host it at