On Sat, Sep 17, 2011 at 4:56 PM, Anthony <wikimail(a)inbox.org> wrote:
On Sat, Sep 17, 2011 at 6:46 PM, Robert Rohde
Is there a good reason to prefer SHA-1?
Both have weaknesses allowing one to construct a collision (with
Considerable effort? I can create an MD5 collision in a few minutes
on my home computer. Is there anything even remotely like this for
If I've been keeping up to date, the collision complexity for MD5 is
about 2^21 operations, and runs in a few seconds (not minutes); and
for SHA-1 down to about 2^52 with current results. The latter
represents about 100 cpu-years, which is within the realm of
supercomputers. That time will probably continue to come down if
people find ways to improve the attacks on SHA-1. (The existing
attacks usually require the ability to feed arbitrary binary strings
into the hash function. Given that both browsers and Mediawiki will
tend to reject binary data placed in an edit window, I'm not sure if
any of the existing attacks could be reliably applied to Mediawiki
If collision attacks really matter we should use SHA-1. However, do
any of the proposed use cases care about whether someone might
intentionally inject a collision? In the proposed uses I've looked at
it, it seems irrelevant. The intentional collision will get flagged
as a revert and the text leading to that collision would be discarded.
How is that a bad thing?
It's a not a big deal, but if I understand prior comments correctly,
most of the existing offline infrastructure uses MD5, so I'm wondering
if there is a distinct use case for favoring SHA-1.
MD5 is shorter
and in my experience about 25% faster to compute.
Personally I've tended to view MD5 as more than good enough in offline analyses.
For offline analyses, there's no need to change the online database tables.
Need? That's debatable, but one of the major motivators is the desire
to have hash values in database dumps (both for revert checks and for
checksums on correct data import / export). Both of those are
"offline" uses, but it is beneficial to have that information
precomputed and stored rather than frequently regenerated.