On Sat, Sep 17, 2011 at 4:56 PM, Anthony wikimail@inbox.org wrote:
On Sat, Sep 17, 2011 at 6:46 PM, Robert Rohde rarohde@gmail.com wrote:
Is there a good reason to prefer SHA-1?
Both have weaknesses allowing one to construct a collision (with considerable effort)
Considerable effort? I can create an MD5 collision in a few minutes on my home computer. Is there anything even remotely like this for SHA-1?
If I've been keeping up to date, the collision complexity for MD5 is about 2^21 operations, and runs in a few seconds (not minutes); and for SHA-1 down to about 2^52 with current results. The latter represents about 100 cpu-years, which is within the realm of supercomputers. That time will probably continue to come down if people find ways to improve the attacks on SHA-1. (The existing attacks usually require the ability to feed arbitrary binary strings into the hash function. Given that both browsers and Mediawiki will tend to reject binary data placed in an edit window, I'm not sure if any of the existing attacks could be reliably applied to Mediawiki editing.)
If collision attacks really matter we should use SHA-1. However, do any of the proposed use cases care about whether someone might intentionally inject a collision? In the proposed uses I've looked at it, it seems irrelevant. The intentional collision will get flagged as a revert and the text leading to that collision would be discarded. How is that a bad thing?
It's a not a big deal, but if I understand prior comments correctly, most of the existing offline infrastructure uses MD5, so I'm wondering if there is a distinct use case for favoring SHA-1.
MD5 is shorter and in my experience about 25% faster to compute.
Personally I've tended to view MD5 as more than good enough in offline analyses.
For offline analyses, there's no need to change the online database tables.
Need? That's debatable, but one of the major motivators is the desire to have hash values in database dumps (both for revert checks and for checksums on correct data import / export). Both of those are "offline" uses, but it is beneficial to have that information precomputed and stored rather than frequently regenerated.
-Robert Rohde