On 09/18/2011 08:55 AM, Robert Rohde wrote:
people find ways to improve the attacks on SHA-1.
attacks usually require the ability to feed arbitrary binary strings
into the hash function. Given that both browsers and Mediawiki will
tend to reject binary data placed in an edit window, I'm not sure if
any of the existing attacks could be reliably applied to Mediawiki
I'm pretty sure MediaWiki will accept any data that's valid UTF-8,
modulo canonicalization perhaps. I'm not very familiar with the MD5 and
SHA-1 collision attacks, but I wouldn't be surprised if at least some of
them could be modified to use, say, only 7-bit ASCII.
If collision attacks really matter we should use
SHA-1. However, do
any of the proposed use cases care about whether someone might
intentionally inject a collision? In the proposed uses I've looked at
it, it seems irrelevant. The intentional collision will get flagged
as a revert and the text leading to that collision would be discarded.
How is that a bad thing?
Well, if you could predict the content of a version that someone (say, a
bot) was likely to save sometime in the future, and created a different
revision with the same hash (say, in the sandbox or in your userspace,
so that people wouldn't notice it) in advance...
Depending on just what page was targeted, the consequences could range
from a minor annoyance to site-wide JS injection.
Anyway, I wouldn't suggest using either MD5 or SHA-1: both have known
attacks, and it's a fundamental rule of cryptography that attacks always
get better over time, never worse. Let's _at least_ use SHA-2.
(Actually, I'd suggest designing the format so that we can change hash
functions in the future without having to rehash every old revision
immediately. For example, we might store a hash computed using SHA-256
as "sha256:d9014c4624844aa..." or something like that.)