Hi,
It might be good to keep a private hash in parallel with the MD5 public hash.
cheers,
Jamie
----- Original Message -----
From: wikitech-l-request(a)lists.wikimedia.org
Date: Sunday, September 18, 2011 3:12 pm
Subject: Wikitech-l Digest, Vol 98, Issue 30
To: wikitech-l(a)lists.wikimedia.org
Send Wikitech-l mailing list submissions to
wikitech-l(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
or, via email, send a message with subject or body 'help' to
wikitech-l-request(a)lists.wikimedia.org
You can reach the person managing the list at
wikitech-l-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wikitech-l digest..."
Today's Topics:
1. Re: Adding MD5 / SHA1 column to revision table
(discussing r94289) (Anthony)
2. Fwd: Adding MD5 / SHA1 column to revision table
(discussing r94289) (Anthony)
3. Re: Adding MD5 / SHA1 column to revision table
(discussing r94289) (Chad)
4. Re: Adding MD5 / SHA1 column to revision table
(discussing r94289) (Anthony)
5. Re: Adding MD5 / SHA1 column to revision table
(discussing r94289) (Chad)
6. Re: Fwd: Adding MD5 / SHA1 column to revision table
(discussing r94289) (Roan Kattouw)
7. Re: Adding MD5 / SHA1 column to revision
table (discussing r94289) (Platonides)
8. Re: Adding MD5 / SHA1 column to revision table
(discussing r94289) (Anthony)
9. Re: Adding MD5 / SHA1 column to revision table
(discussing r94289) (Anthony)
10. Re: Fwd: Adding MD5 / SHA1 column to revision table
(discussing r94289) (Anthony)
-----------------------------------------------------------------
-----
Message: 1
Date: Sun, 18 Sep 2011 16:57:22 -0400
From: Anthony <wikimail(a)inbox.org>
Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
(discussing r94289)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<CAPreJLR8Rhut8gdqizxmDuo5-CAd3Yi_S-
G478071LD=fg9XTQ(a)mail.gmail.com>Content-Typeype: text/plain;
charset=ISO-8859-7
On Sun, Sep 18, 2011 at 2:33 AM, Ariel T. Glenn
<ariel(a)wikimedia.org> wrote:
???? 17-09-2011, ????? ???, ??? ??? 22:55 -0700,
?/? Robert Rohde
??????:
> On Sat, Sep 17, 2011 at 4:56 PM, Anthony
<wikimail(a)inbox.org> wrote:
<snip>
> > For offline analyses, there's no need to change the online
database
tables.
>
Need? ?That's debatable, but one of the major motivators is
the desire
> to have hash values in database dumps (both
for revert checks
and for
checksums
on correct data import / export). ?Both of those are
"offline" uses, but it is beneficial to have that information
precomputed and stored rather than frequently regenerated.
If we don't have it in the online database tables, this
defeats the
purpose of having the value in there at all, for
the purpose of
generating the XML dumps.
Recall that the dumps are generated in two passes; in the
first pass we
retrieve from the db and record all of the
metadata about
revisions, and
in the second (time-comsuming) pass we re-use the
text of the
revisions> from a previous dump file if the text is in there.
?We want to compare
the has of that text against what the online
database says the
hash is;
if they don't match, we want to fetch the
live copy.
Well, this is exactly the type of use in which collisions do matter.
Do you really want the dump to not record the correct data when some
miscreant creates an intentional collision?
------------------------------
Message: 2
Date: Sun, 18 Sep 2011 17:00:32 -0400
From: Anthony <wikimail(a)inbox.org>
Subject: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision table
(discussing r94289)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<CAPreJLQr4tTyBkrhwc5Lnf6Xw93eYDv02jAtCNtzRp910_CZ-
Q(a)mail.gmail.com>Content-Typeype: text/plain; charset=ISO-8859-1
On Sun, Sep 18, 2011 at 1:55 AM, Robert Rohde
<rarohde(a)gmail.com> wrote:
If collision attacks really matter we should use
SHA-1.
If collision attacks really matter you should use, at least, SHA-
256, no?
However, do
any of the proposed use cases care about whether someone might
intentionally inject a collision? ?In the proposed uses I've
looked at
it, it seems irrelevant. ?The intentional
collision will get flagged
as a revert and the text leading to that collision would be
discarded.> ?How
is that a bad thing?
Well, what if the checksum of the initial page hasn't been calculated
yet? ?Then some miscreant sets the page to spam which collides, and
then the spam gets reverted. ?The good page would be the one
that gets
thrown out.
Maybe that's not feasible. ?Maybe it is. ?Either way, I'd feel very
uncomfortable about the fact that someday someone might decide
to use
the checksums in some way in which collisions would matter.
Now I don't know how important the CPU differences in
calculating the
two versions would be. ?If they're significant enough, then
fine, use
MD5, but make sure there are warnings all over the place about its
use.
(As another possibility, what if someone writes a bot to detect
certain reverts? ?I can see spammers/vandals having a field day with
this sort of thing.)
> For offline analyses, there's no need to
change the online
database tables.
Need? ?That's debatable, but one of the major motivators is
the desire
to have hash values in database dumps (both for
revert checks
and for
checksums on correct data import / export). ?Both
of those are
"offline" uses, but it is beneficial to have that information
precomputed and stored rather than frequently regenerated.
Why not in a separate file? ?There's no need to get permission from
anyone or mess with the schema to generate a file with revision ids
and checksums. ?If WMF won't host it at the regular dump location
(which I can't see why they wouldn't), you could host it at
archive.org.
------------------------------
Message: 3
Date: Sun, 18 Sep 2011 17:30:52 -0400
From: Chad <innocentkiller(a)gmail.com>
Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
(discussing r94289)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<CADn73rM9R26GnyXGAFEC6_8Jb3AbT6ML0sVYyR-
E4cXaZ2WR3g(a)mail.gmail.com>Content-Typeype: text/plain; charset=UTF-8
On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson
<rnnelson(a)clarkson.edu> wrote:
It is meaningless to talk about cryptography
without a threat
model, just as Robert says. Is anybody actually attacking us? Or
are we worried about accidental collisions?
I believe it began as accidental collisions, then everyone promptly
put on their tinfoil hats and started talking about a hypothetical
vandal who has the time and desire to generate hash collisions.
-Chad
------------------------------
Message: 4
Date: Sun, 18 Sep 2011 17:47:51 -0400
From: Anthony <wikimail(a)inbox.org>
Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
(discussing r94289)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<CAPreJLSmMi4qqZLZmY3LzOmEO-8JgqjpJgmLfh-
8+gPx4v5Rmg(a)mail.gmail.com>Content-Typeype: text/plain; charset=ISO-
8859-1
On Sun, Sep 18, 2011 at 5:30 PM, Chad
<innocentkiller(a)gmail.com> wrote:
On Sun, Sep 18, 2011 at 7:24 AM, Russell N.
Nelson - rnnelson
<rnnelson(a)clarkson.edu> wrote:
> It is meaningless to talk about cryptography without a threat
model, just as
Robert says. Is anybody actually attacking us? Or
are we worried about accidental collisions?
>>
> I believe it began as accidental
collisions, then everyone promptly
> put on their tinfoil hats and started talking about a hypothetical
> vandal who has the time and desire to generate hash collisions.
Having run a wiki which I eventually abandoned due to various "Grawp
attacks", I can assure you that there's nothing hypothetical
about it.
------------------------------
Message: 5
Date: Sun, 18 Sep 2011 17:50:12 -0400
From: Chad <innocentkiller(a)gmail.com>
Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
(discussing r94289)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<CADn73rMGkgSPG4nbvB34EKfKp99d5LWcNuALQLrpUta45YaHiA(a)mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Sun, Sep 18, 2011 at 5:47 PM, Anthony <wikimail(a)inbox.org>
wrote:> On Sun, Sep 18, 2011 at 5:30 PM, Chad
<innocentkiller(a)gmail.com> wrote:
> On Sun, Sep 18, 2011 at 7:24 AM, Russell N.
Nelson - rnnelson
> <rnnelson(a)clarkson.edu> wrote:
>> It is meaningless to talk about cryptography without a
threat model, just
as Robert says. Is anybody actually attacking
us? Or are we worried about accidental collisions?
>>>
>>
>> I believe it began as accidental collisions, then everyone promptly
>> put on their tinfoil hats and started talking about a hypothetical
>> vandal who has the time and desire to generate hash collisions.
> Having run a wiki which I eventually
abandoned due to various "Grawp
> attacks", I can assure you that there's nothing hypothetical
about it.
For those of us who do not know...what the heck is a Grawp attack?
Does it involve generating hash collisions?
-Chad
------------------------------
Message: 6
Date: Mon, 19 Sep 2011 00:00:11 +0200
From: Roan Kattouw <roan.kattouw(a)gmail.com>
Subject: Re: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision
table (discussing r94289)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<CALoQHwEOyjQhzRKJM_efPCz7OrG=GbubZ7wMGyqmzrwBGZE6zg(a)mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Sun, Sep 18, 2011 at 11:00 PM, Anthony
<wikimail(a)inbox.org> wrote:
Now I don't know how important the CPU
differences in
calculating the
two versions would be. ?If they're
significant enough, then
fine, use
> MD5, but make sure there are warnings all over the place about its
> use.
I ran some benchmarks on one of the WMF
machines. The input I
used is
a 137.5 MB (144,220,582 bytes) OGV file that someone asked me to
upload to Commons recently. For each benchmark, I hashed the
file 25
times and computed the average running time.
MD5: 393 ms
SHA-1: 404 ms
SHA-256: 1281 ms
Note that the input size is many times higher than $wgMaxArticleSize,
which is set to 2000 KB at WMF. For historical reasons, we have some
revisions in our history that are larger; Ariel would be able to tell
you how large, but I believe nothing in there is larger than 10
MB. So
I decided to run the numbers for more realistic sizes as well, using
the first 2 MB and 10 MB, respectively, of my OGV file.
For 2 MB (averages of 1000 runs):
MD5: 5.66 ms
SHA-1: 5.85 ms
SHA-256: 18.56 ms
For 10 MB (averages of 200 runs):
MD5: 28.6 ms
SHA-1: 29.47 ms
SHA-256: 93.49 ms
So yes, SHA-256 is a few times (just over 3x) more expensive to
compute than SHA-1, which in turn is only a few percent slower than
MD5. However, on the largest possible size we allow for new revisions
it takes < 20ms. It sounds like that's an acceptable worst
case for
on-the-fly population, since saves and parses are slow anyway,
especially for 2 MB of wikitext. The 10 MB case is only relevant for
backfilling, which we could do from a maintenance script, and
< 100ms
is definitely acceptable there.
Roan Kattouw (Catrope)
------------------------------
Message: 7
Date: Mon, 19 Sep 2011 00:07:32 +0200
From: Platonides <Platonides(a)gmail.com>
Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
(discussing r94289)
To: wikitech-l(a)lists.wikimedia.org
Message-ID: <j55pnn$o1j$1(a)dough.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Chad wrote:
> For those of us who do not know...what the heck is a Grawp attack?
> Does it involve generating hash collisions?
> -Chad
It's the name of a wikipedia vandal.
http://en.wikipedia.org/wiki/User:Grawp
------------------------------
Message: 8
Date: Sun, 18 Sep 2011 18:01:47 -0400
From: Anthony <wikimail(a)inbox.org>
Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
(discussing r94289)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<CAPreJLRWeY5EhaxZb+wACoX4r5PpenW7fPMiEWrkiwNb=XaRUw(a)mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Sun, Sep 18, 2011 at 5:50 PM, Chad
<innocentkiller(a)gmail.com> wrote:
On Sun, Sep 18, 2011 at 5:47 PM, Anthony
<wikimail(a)inbox.org> wrote:
> On Sun, Sep 18, 2011 at 5:30 PM, Chad
<innocentkiller(a)gmail.com> wrote:
>> On Sun, Sep 18, 2011 at 7:24 AM, Russell
N. Nelson - rnnelson
>> <rnnelson(a)clarkson.edu> wrote:
>>> It is meaningless to talk about cryptography without a
threat model,
just as Robert says. Is anybody actually attacking
us? Or are we worried about accidental collisions?
>>>
>>
>> I believe it began as accidental collisions, then everyone
promptly>>> put on their tinfoil hats and started talking about
a hypothetical
>> vandal who has the time and desire to
generate hash collisions.
>
> Having run a wiki which I eventually abandoned due to various
"Grawp>> attacks", I can assure you that there's nothing
hypothetical about it.
>>
> For those of us who do not
know...what the heck is a Grawp attack?
> Does it involve generating hash collisions?
It does not involve generating hash collisions, but it involves
finding various bugs in mediawiki and using them to vandalise, often
by injecting javascript. The best description I could find
was at
Encyclopedia Dramatica, which seems to be taken down (there's a cache
if you do a google search for "grawp wikipedia"). There's
also a
description at
http://en.wikipedia.org/wiki/User:Grawp , which does
not do justice to the "mad hacker skillz" of this individual and his
intent on finding bugs in mediawiki and exploiting them.
If you did something as lame as relying on no one generating an MD5
collision (*), it would happen. If you use SHA-1, it may
or may not
happen, depending on how quickly computers get faster, and how many
further attacks are made on the algorithm. If you use SHA-
256 (**),
it's significantly less likely to happen, and you'll probably
have a
warning in the form of an announcement on Slashdot that SHA-256 has
been broken, before it happens.
(*) Something which I have done myself on my home computer in a couple
minutes, and apparently now can be done in a couple seconds.
(**) Which, incidentally, is possibly the single most secure
hash for
Wikimedia to use at the current time. SHA-512 is
significantly more
"broken" than SHA-256, and the more theoretically secure hashes have
received much less scrutiny than SHA-256. If you want to
be more
secure than SHA-256, you should combine SHA-256 with some other
hashing algorithm.)
------------------------------
Message: 9
Date: Sun, 18 Sep 2011 18:06:21 -0400
From: Anthony <wikimail(a)inbox.org>
Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
(discussing r94289)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<CAPreJLQ0YUq9j8zr52Lme2eo=ijyjN6x6CssF=xcFcsTo6YTUg(a)mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Sun, Sep 18, 2011 at 6:01 PM, Anthony <wikimail(a)inbox.org>
wrote:> There's also a
description at
http://en.wikipedia.org/wiki/User:Grawp , which does
not do justice to the "mad hacker skillz" of this individual
and his
intent on finding bugs in mediawiki and
exploiting them.
(and/or the Grawp copycats - personally I don't know if it was "Grawp"
himself or a copycat that attacked my wiki)
------------------------------
Message: 10
Date: Sun, 18 Sep 2011 18:12:34 -0400
From: Anthony <wikimail(a)inbox.org>
Subject: Re: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision
table (discussing r94289)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<CAPreJLR7gd=jNMnrZ-
bxyB0RPx7sdPOSzygKALx8S1Wg8N2Q1w(a)mail.gmail.com>Content-Typeype:
text/plain; charset=ISO-8859-1
On Sun, Sep 18, 2011 at 6:00 PM, Roan Kattouw
<roan.kattouw(a)gmail.com> wrote:
On Sun, Sep 18, 2011 at 11:00 PM, Anthony
<wikimail(a)inbox.org> wrote:
> Now I don't know how important the CPU
differences in
calculating the
> two versions would be. ?If they're
significant enough, then
fine, use
> MD5, but make sure there are warnings all
over the place
about its
use.
I ran some benchmarks on one of the WMF machines. The input I
used is
a 137.5 MB (144,220,582 bytes) OGV file that
someone asked me to
upload to Commons recently. For each benchmark, I hashed the
file 25
> times and computed the average running time.
> MD5: 393 ms
> SHA-1: 404 ms
> SHA-256: 1281 ms
Did you try any of the non-secure hash functions? If
you're going to
go with MD5, might as well go with the significantly faster CRC-64.
If you're just using it to detect reverts, then you can run the
CRC-64
check first, and then confirm with a check of the entire message.
------------------------------
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
End of Wikitech-l Digest, Vol 98, Issue 30
******************************************