Adding MD5 / SHA1 column to revision table (discussing r94289) - Wikitech-l

18 Aug 2011

Hi!
I am starting this thread because Brion's revision r94289 reverted
r94289 [0] stating "core schema change with no discussion" [1].
Bugs 21860 [2] and 25312 [3] advocate for the inclusion of a hash
column (either md5 or sha1) in the revision table. The primary use
case of this column will be to assist detecting reverts. I don't think
that data integrity is the primary reason for adding this column. The
huge advantage of having such a column is that it will not be longer
necessary to analyze full dumps to detect reverts, instead you can
look for reverts in the stub dump file by looking for the same hash
within a single page. The fact that there is a theoretical chance of a
collision is not very important IMHO, it would just mean that in very
rare cases in our research we would flag an edit being reverted  while
it's not. The two bug reports contain quite long discussions and this
feature has also been discussed internally quite extensively but oddly
enough it hasn't happened yet on the mailinglist.

So let's have a discussion!

[0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289
[1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94541
[2] https://bugzilla.wikimedia.org/show_bug.cgi?id=21860
[3] https://bugzilla.wikimedia.org/show_bug.cgi?id=25312

Best,

Diederik