On Sat, Sep 17, 2011 at 8:26 AM, Roan Kattouw <roan.kattouw(a)gmail.com> wrote:
Minor detail: I think it's more likely we'll
use SHA-1 hashes rather
than MD5 hashes.
Is there a good reason to prefer SHA-1?
Both have weaknesses allowing one to construct a collision (with
considerable effort), but I wouldn't see why that would matter for the
proposed use.
With only about 1 billion revisions in the collective databases, the
odds of an accidental collision with either MD5 or SHA-1 is
infinitesimal (less than 1 in 10^18 for the weaker MD5).
MD5 is shorter and in my experience about 25% faster to compute.
Personally I've tended to view MD5 as more than good enough in offline analyses.
-Robert Rohde