Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table

20 Sep 2011


      ...
...
I ran some benchmarks on one of the WMF machines. The input I used is
a 137.5 MB (144,220,582 bytes) OGV file that someone asked me to
upload to Commons recently. For each benchmark, I hashed the file 25
times and computed the average running time.
MD5: 393 ms
SHA-1: 404 ms
SHA-256: 1281 ms
Can we keep some perspective please? MD5 is plenty good enough for the 
purposes discussed here. It's fast, and almost as important, is easily 
supported by many OSs, libraries, etc. As far as collisions, there are 
plenty of easy solutions, such as:
* Check for a collision before allowing a new revision, and do something 
if so (to handle the pre-image attack)
* When reverting, do a select count(*) where md5=? and then do something 
more advanced when more than one match is found
* Use the checksum to find the revision fast, but still do a full byte 
comparison.
I've only seen one real attack scenario mentioned in this thread - 
that of someone creating a new page with the same checksum as an existing 
one, for purposes of messing up the reversion system. Are there other 
attacks we should worry about?
I'm also of the opinion that we should just store things as CHAR(32), 
unless someone thinks space is really at that much of a premium. The big 
advantage of 32 chars (i.e. 0-9a-f aka hexadecimal ) is that it's a 
standard way to represent things, making use of common tools (e.g. md5sum)
much easier.
-- 
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table