New subject: Adding MD5 / SHA1 column to revision table (discussing r94289)

18 Sep 2011


      It is meaningless to talk about cryptography without a threat model, just as Robert says. Is anybody actually attacking us? Or are we worried about accidental collisions?
Sent from my Verizon Wireless Phone
-----Original message-----
From: Robert Rohde rarohde@gmail.com
To: Wikimedia developers wikitech-l@lists.wikimedia.org
Sent: Sun, Sep 18, 2011 05:56:15 GMT+00:00
Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)
On Sat, Sep 17, 2011 at 4:56 PM, Anthony wikimail@inbox.org wrote:
...
On Sat, Sep 17, 2011 at 6:46 PM, Robert Rohde rarohde@gmail.com wrote:
...
Is there a good reason to prefer SHA-1?
Both have weaknesses allowing one to construct a collision (with
considerable effort)
Considerable effort?  I can create an MD5 collision in a few minutes
on my home computer.  Is there anything even remotely like this for
SHA-1?
If I've been keeping up to date, the collision complexity for MD5 is
about 2^21 operations, and runs in a few seconds (not minutes); and
for SHA-1 down to about 2^52 with current results.  The latter
represents about 100 cpu-years, which is within the realm of
supercomputers.  That time will probably continue to come down if
people find ways to improve the attacks on SHA-1.  (The existing
attacks usually require the ability to feed arbitrary binary strings
into the hash function.  Given that both browsers and Mediawiki will
tend to reject binary data placed in an edit window, I'm not sure if
any of the existing attacks could be reliably applied to Mediawiki
editing.)
If collision attacks really matter we should use SHA-1.  However, do
any of the proposed use cases care about whether someone might
intentionally inject a collision?  In the proposed uses I've looked at
it, it seems irrelevant.  The intentional collision will get flagged
as a revert and the text leading to that collision would be discarded.
 How is that a bad thing?
It's a not a big deal, but if I understand prior comments correctly,
most of the existing offline infrastructure uses MD5, so I'm wondering
if there is a distinct use case for favoring SHA-1.
...
...
MD5 is shorter and in my experience about 25% faster to compute.
Personally I've tended to view MD5 as more than good enough in offline analyses.
For offline analyses, there's no need to change the online database tables.
Need?  That's debatable, but one of the major motivators is the desire
to have hash values in database dumps (both for revert checks and for
checksums on correct data import / export).  Both of those are
"offline" uses, but it is beneficial to have that information
precomputed and stored rather than frequently regenerated.
-Robert Rohde
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l