On 09/18/2011 08:55 AM, Robert Rohde wrote:
people find ways to improve the attacks on SHA-1. (The existing attacks usually require the ability to feed arbitrary binary strings into the hash function. Given that both browsers and Mediawiki will tend to reject binary data placed in an edit window, I'm not sure if any of the existing attacks could be reliably applied to Mediawiki editing.)
I'm pretty sure MediaWiki will accept any data that's valid UTF-8, modulo canonicalization perhaps. I'm not very familiar with the MD5 and SHA-1 collision attacks, but I wouldn't be surprised if at least some of them could be modified to use, say, only 7-bit ASCII.
If collision attacks really matter we should use SHA-1. However, do any of the proposed use cases care about whether someone might intentionally inject a collision? In the proposed uses I've looked at it, it seems irrelevant. The intentional collision will get flagged as a revert and the text leading to that collision would be discarded. How is that a bad thing?
Well, if you could predict the content of a version that someone (say, a bot) was likely to save sometime in the future, and created a different revision with the same hash (say, in the sandbox or in your userspace, so that people wouldn't notice it) in advance...
Depending on just what page was targeted, the consequences could range from a minor annoyance to site-wide JS injection.
Anyway, I wouldn't suggest using either MD5 or SHA-1: both have known attacks, and it's a fundamental rule of cryptography that attacks always get better over time, never worse. Let's _at least_ use SHA-2.
(Actually, I'd suggest designing the format so that we can change hash functions in the future without having to rehash every old revision immediately. For example, we might store a hash computed using SHA-256 as "sha256:d9014c4624844aa..." or something like that.)