Hi,
It might be good to keep a private hash in parallel with the MD5 public hash.
cheers, Jamie
----- Original Message ----- From: wikitech-l-request@lists.wikimedia.org Date: Sunday, September 18, 2011 3:12 pm Subject: Wikitech-l Digest, Vol 98, Issue 30 To: wikitech-l@lists.wikimedia.org
Send Wikitech-l mailing list submissions to wikitech-l@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/wikitech-l or, via email, send a message with subject or body 'help' to wikitech-l-request@lists.wikimedia.org
You can reach the person managing the list at wikitech-l-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Wikitech-l digest..."
Today's Topics:
1. Re: Adding MD5 / SHA1 column to revision table (discussing r94289) (Anthony) 2. Fwd: Adding MD5 / SHA1 column to revision table (discussing r94289) (Anthony) 3. Re: Adding MD5 / SHA1 column to revision table (discussing r94289) (Chad) 4. Re: Adding MD5 / SHA1 column to revision table (discussing r94289) (Anthony) 5. Re: Adding MD5 / SHA1 column to revision table (discussing r94289) (Chad) 6. Re: Fwd: Adding MD5 / SHA1 column to revision table (discussing r94289) (Roan Kattouw) 7. Re: Adding MD5 / SHA1 column to revision table (discussing r94289) (Platonides) 8. Re: Adding MD5 / SHA1 column to revision table (discussing r94289) (Anthony) 9. Re: Adding MD5 / SHA1 column to revision table (discussing r94289) (Anthony) 10. Re: Fwd: Adding MD5 / SHA1 column to revision table (discussing r94289) (Anthony)
Message: 1 Date: Sun, 18 Sep 2011 16:57:22 -0400 From: Anthony wikimail@inbox.org Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289) To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: <CAPreJLR8Rhut8gdqizxmDuo5-CAd3Yi_S- G478071LD=fg9XTQ@mail.gmail.com>Content-Type: text/plain; charset=ISO-8859-7
On Sun, Sep 18, 2011 at 2:33 AM, Ariel T. Glenn ariel@wikimedia.org wrote:
???? 17-09-2011, ????? ???, ??? ??? 22:55 -0700, ?/? Robert Rohde ??????:
On Sat, Sep 17, 2011 at 4:56 PM, Anthony
wikimail@inbox.org wrote:
<snip>
For offline analyses, there's no need to change the online
database tables.
Need? ?That's debatable, but one of the major motivators is
the desire
to have hash values in database dumps (both for revert checks
and for
checksums on correct data import / export). ?Both of those are "offline" uses, but it is beneficial to have that information precomputed and stored rather than frequently regenerated.
If we don't have it in the online database tables, this
defeats the
purpose of having the value in there at all, for the purpose of generating the XML dumps.
Recall that the dumps are generated in two passes; in the
first pass we
retrieve from the db and record all of the metadata about
revisions, and
in the second (time-comsuming) pass we re-use the text of the
revisions> from a previous dump file if the text is in there. ?We want to compare
the has of that text against what the online database says the
hash is;
if they don't match, we want to fetch the live copy.
Well, this is exactly the type of use in which collisions do matter. Do you really want the dump to not record the correct data when some miscreant creates an intentional collision?
Message: 2 Date: Sun, 18 Sep 2011 17:00:32 -0400 From: Anthony wikimail@inbox.org Subject: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision table (discussing r94289) To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: <CAPreJLQr4tTyBkrhwc5Lnf6Xw93eYDv02jAtCNtzRp910_CZ- Q@mail.gmail.com>Content-Type: text/plain; charset=ISO-8859-1
On Sun, Sep 18, 2011 at 1:55 AM, Robert Rohde rarohde@gmail.com wrote:
If collision attacks really matter we should use SHA-1.
If collision attacks really matter you should use, at least, SHA- 256, no?
However, do any of the proposed use cases care about whether someone might intentionally inject a collision? ?In the proposed uses I've
looked at
it, it seems irrelevant. ?The intentional collision will get flagged as a revert and the text leading to that collision would be
discarded.> ?How is that a bad thing?
Well, what if the checksum of the initial page hasn't been calculated yet? ?Then some miscreant sets the page to spam which collides, and then the spam gets reverted. ?The good page would be the one that gets thrown out.
Maybe that's not feasible. ?Maybe it is. ?Either way, I'd feel very uncomfortable about the fact that someday someone might decide to use the checksums in some way in which collisions would matter.
Now I don't know how important the CPU differences in calculating the two versions would be. ?If they're significant enough, then fine, use MD5, but make sure there are warnings all over the place about its use.
(As another possibility, what if someone writes a bot to detect certain reverts? ?I can see spammers/vandals having a field day with this sort of thing.)
For offline analyses, there's no need to change the online
database tables.
Need? ?That's debatable, but one of the major motivators is
the desire
to have hash values in database dumps (both for revert checks
and for
checksums on correct data import / export). ?Both of those are "offline" uses, but it is beneficial to have that information precomputed and stored rather than frequently regenerated.
Why not in a separate file? ?There's no need to get permission from anyone or mess with the schema to generate a file with revision ids and checksums. ?If WMF won't host it at the regular dump location (which I can't see why they wouldn't), you could host it at archive.org.
Message: 3 Date: Sun, 18 Sep 2011 17:30:52 -0400 From: Chad innocentkiller@gmail.com Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289) To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: <CADn73rM9R26GnyXGAFEC6_8Jb3AbT6ML0sVYyR- E4cXaZ2WR3g@mail.gmail.com>Content-Type: text/plain; charset=UTF-8
On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson rnnelson@clarkson.edu wrote:
It is meaningless to talk about cryptography without a threat
model, just as Robert says. Is anybody actually attacking us? Or are we worried about accidental collisions?
I believe it began as accidental collisions, then everyone promptly put on their tinfoil hats and started talking about a hypothetical vandal who has the time and desire to generate hash collisions.
-Chad
Message: 4 Date: Sun, 18 Sep 2011 17:47:51 -0400 From: Anthony wikimail@inbox.org Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289) To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: <CAPreJLSmMi4qqZLZmY3LzOmEO-8JgqjpJgmLfh- 8+gPx4v5Rmg@mail.gmail.com>Content-Type: text/plain; charset=ISO- 8859-1
On Sun, Sep 18, 2011 at 5:30 PM, Chad innocentkiller@gmail.com wrote:
On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson rnnelson@clarkson.edu wrote:
It is meaningless to talk about cryptography without a threat
model, just as Robert says. Is anybody actually attacking us? Or are we worried about accidental collisions?
I believe it began as accidental collisions, then everyone promptly put on their tinfoil hats and started talking about a hypothetical vandal who has the time and desire to generate hash collisions.
Having run a wiki which I eventually abandoned due to various "Grawp attacks", I can assure you that there's nothing hypothetical about it.
Message: 5 Date: Sun, 18 Sep 2011 17:50:12 -0400 From: Chad innocentkiller@gmail.com Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289) To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: CADn73rMGkgSPG4nbvB34EKfKp99d5LWcNuALQLrpUta45YaHiA@mail.gmail.com Content-Type: text/plain; charset=UTF-8
On Sun, Sep 18, 2011 at 5:47 PM, Anthony wikimail@inbox.org wrote:> On Sun, Sep 18, 2011 at 5:30 PM, Chad innocentkiller@gmail.com wrote:
On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson rnnelson@clarkson.edu wrote:
It is meaningless to talk about cryptography without a
threat model, just as Robert says. Is anybody actually attacking us? Or are we worried about accidental collisions?
I believe it began as accidental collisions, then everyone promptly put on their tinfoil hats and started talking about a hypothetical vandal who has the time and desire to generate hash collisions.
Having run a wiki which I eventually abandoned due to various "Grawp attacks", I can assure you that there's nothing hypothetical
about it.
For those of us who do not know...what the heck is a Grawp attack? Does it involve generating hash collisions?
-Chad
Message: 6 Date: Mon, 19 Sep 2011 00:00:11 +0200 From: Roan Kattouw roan.kattouw@gmail.com Subject: Re: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision table (discussing r94289) To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: CALoQHwEOyjQhzRKJM_efPCz7OrG=GbubZ7wMGyqmzrwBGZE6zg@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
On Sun, Sep 18, 2011 at 11:00 PM, Anthony wikimail@inbox.org wrote:
Now I don't know how important the CPU differences in
calculating the
two versions would be. ?If they're significant enough, then
fine, use
MD5, but make sure there are warnings all over the place about its use.
I ran some benchmarks on one of the WMF machines. The input I used is a 137.5 MB (144,220,582 bytes) OGV file that someone asked me to upload to Commons recently. For each benchmark, I hashed the file 25 times and computed the average running time.
MD5: 393 ms SHA-1: 404 ms SHA-256: 1281 ms
Note that the input size is many times higher than $wgMaxArticleSize, which is set to 2000 KB at WMF. For historical reasons, we have some revisions in our history that are larger; Ariel would be able to tell you how large, but I believe nothing in there is larger than 10 MB. So I decided to run the numbers for more realistic sizes as well, using the first 2 MB and 10 MB, respectively, of my OGV file.
For 2 MB (averages of 1000 runs):
MD5: 5.66 ms SHA-1: 5.85 ms SHA-256: 18.56 ms
For 10 MB (averages of 200 runs):
MD5: 28.6 ms SHA-1: 29.47 ms SHA-256: 93.49 ms
So yes, SHA-256 is a few times (just over 3x) more expensive to compute than SHA-1, which in turn is only a few percent slower than MD5. However, on the largest possible size we allow for new revisions it takes < 20ms. It sounds like that's an acceptable worst case for on-the-fly population, since saves and parses are slow anyway, especially for 2 MB of wikitext. The 10 MB case is only relevant for backfilling, which we could do from a maintenance script, and < 100ms is definitely acceptable there.
Roan Kattouw (Catrope)
Message: 7 Date: Mon, 19 Sep 2011 00:07:32 +0200 From: Platonides Platonides@gmail.com Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289) To: wikitech-l@lists.wikimedia.org Message-ID: j55pnn$o1j$1@dough.gmane.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Chad wrote:
For those of us who do not know...what the heck is a Grawp attack? Does it involve generating hash collisions?
-Chad
It's the name of a wikipedia vandal. http://en.wikipedia.org/wiki/User:Grawp
Message: 8 Date: Sun, 18 Sep 2011 18:01:47 -0400 From: Anthony wikimail@inbox.org Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289) To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: CAPreJLRWeY5EhaxZb+wACoX4r5PpenW7fPMiEWrkiwNb=XaRUw@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
On Sun, Sep 18, 2011 at 5:50 PM, Chad innocentkiller@gmail.com wrote:
On Sun, Sep 18, 2011 at 5:47 PM, Anthony
wikimail@inbox.org wrote:
On Sun, Sep 18, 2011 at 5:30 PM, Chad
innocentkiller@gmail.com wrote:
On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson rnnelson@clarkson.edu wrote:
It is meaningless to talk about cryptography without a
threat model, just as Robert says. Is anybody actually attacking us? Or are we worried about accidental collisions?
I believe it began as accidental collisions, then everyone
promptly>>> put on their tinfoil hats and started talking about a hypothetical
vandal who has the time and desire to generate hash collisions.
Having run a wiki which I eventually abandoned due to various
"Grawp>> attacks", I can assure you that there's nothing hypothetical about it.
For those of us who do not know...what the heck is a Grawp attack? Does it involve generating hash collisions?
It does not involve generating hash collisions, but it involves finding various bugs in mediawiki and using them to vandalise, often by injecting javascript. The best description I could find was at Encyclopedia Dramatica, which seems to be taken down (there's a cache if you do a google search for "grawp wikipedia"). There's also a description at http://en.wikipedia.org/wiki/User:Grawp , which does not do justice to the "mad hacker skillz" of this individual and his intent on finding bugs in mediawiki and exploiting them.
If you did something as lame as relying on no one generating an MD5 collision (*), it would happen. If you use SHA-1, it may or may not happen, depending on how quickly computers get faster, and how many further attacks are made on the algorithm. If you use SHA- 256 (**), it's significantly less likely to happen, and you'll probably have a warning in the form of an announcement on Slashdot that SHA-256 has been broken, before it happens.
(*) Something which I have done myself on my home computer in a couple minutes, and apparently now can be done in a couple seconds.
(**) Which, incidentally, is possibly the single most secure hash for Wikimedia to use at the current time. SHA-512 is significantly more "broken" than SHA-256, and the more theoretically secure hashes have received much less scrutiny than SHA-256. If you want to be more secure than SHA-256, you should combine SHA-256 with some other hashing algorithm.)
Message: 9 Date: Sun, 18 Sep 2011 18:06:21 -0400 From: Anthony wikimail@inbox.org Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289) To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: CAPreJLQ0YUq9j8zr52Lme2eo=ijyjN6x6CssF=xcFcsTo6YTUg@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
On Sun, Sep 18, 2011 at 6:01 PM, Anthony wikimail@inbox.org wrote:> There's also a
description at http://en.wikipedia.org/wiki/User:Grawp , which does not do justice to the "mad hacker skillz" of this individual
and his
intent on finding bugs in mediawiki and exploiting them.
(and/or the Grawp copycats - personally I don't know if it was "Grawp" himself or a copycat that attacked my wiki)
Message: 10 Date: Sun, 18 Sep 2011 18:12:34 -0400 From: Anthony wikimail@inbox.org Subject: Re: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision table (discussing r94289) To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: <CAPreJLR7gd=jNMnrZ- bxyB0RPx7sdPOSzygKALx8S1Wg8N2Q1w@mail.gmail.com>Content-Type: text/plain; charset=ISO-8859-1
On Sun, Sep 18, 2011 at 6:00 PM, Roan Kattouw roan.kattouw@gmail.com wrote:
On Sun, Sep 18, 2011 at 11:00 PM, Anthony
wikimail@inbox.org wrote:
Now I don't know how important the CPU differences in
calculating the
two versions would be. ?If they're significant enough, then
fine, use
MD5, but make sure there are warnings all over the place
about its
use.
I ran some benchmarks on one of the WMF machines. The input I
used is
a 137.5 MB (144,220,582 bytes) OGV file that someone asked me to upload to Commons recently. For each benchmark, I hashed the
file 25
times and computed the average running time.
MD5: 393 ms SHA-1: 404 ms SHA-256: 1281 ms
Did you try any of the non-secure hash functions? If you're going to go with MD5, might as well go with the significantly faster CRC-64.
If you're just using it to detect reverts, then you can run the CRC-64 check first, and then confirm with a check of the entire message.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
End of Wikitech-l Digest, Vol 98, Issue 30