On Sat, Sep 17, 2011 at 8:26 AM, Roan Kattouw roan.kattouw@gmail.com wrote:
Minor detail: I think it's more likely we'll use SHA-1 hashes rather than MD5 hashes.
Is there a good reason to prefer SHA-1?
Both have weaknesses allowing one to construct a collision (with considerable effort), but I wouldn't see why that would matter for the proposed use.
With only about 1 billion revisions in the collective databases, the odds of an accidental collision with either MD5 or SHA-1 is infinitesimal (less than 1 in 10^18 for the weaker MD5).
MD5 is shorter and in my experience about 25% faster to compute.
Personally I've tended to view MD5 as more than good enough in offline analyses.
-Robert Rohde