On Wed, Aug 18, 2010 at 11:37 PM, Tim Starling tstarling@wikimedia.org wrote: <snip>
The idea I came up with is to hash the output of str_repeat(). This increases the number of rounds of the compression function, while avoiding tight loops in PHP code.
<snip>
My proposed hash function is a B-type MD5 salted hash, which is then further hashed with a configurable number of invocations of WHIRLPOOL, with a 256-bit substring taken from a MediaWiki-specific location. The input to each WHIRLPOOL operation is expanded by a factor of 100 with str_repeat().
<snip>
Let me preface my comment by saying that I haven't studied WHIRLPOOL, and the following may not apply to it at all.
However, it is known that some block cypher based hashes behave poorly when fed repeated copies of the same block. In the worst cases the hash space is substantially truncated from its full size (which probably is not the case for any serious cryptographic hash function). Under less severe cases, cryptanalysis can find a new block cipher W' such that N applications of block cipher W is the same as one application of W'. If WHIRLPOOL is vulnerable to that kind of attack then it would negate the effect of using str_repeat in your code.
Like I said, I don't know if either concern applies to WHIRLPOOL. However, these concerns only occur because the 256-bit string you are repeating is a fundamental divisor of the 512-bit block size used by WHIRLPOOL. So, it is trivial to avoid the whole issue simply by using a different repeated block size. For example 97 copies of a 33 byte string should have essentially the same computational performance, while making any associated cryptanalysis threat impossible (or at least less likely).
My only other comment is something you presumably already know. Your proposal is still nothing but an arms race. It makes hashes harder to crack by making the hash function itself much more computationally expensive. However, you'd still have to periodically boost the rep rate with the intention of staying far in front of the hackers.
As a complementary approach it would be nice if there was something in Mediawiki to aid in the selection of strong passwords. Regardless of hash function, it will still take about two billion times longer to find one 10 character password in [A-Za-z0-9] as it does to find a 6 character password in [a-z]. Even if password strength testing algorithms were disabled on Wikipedia sites, it would still be a nice addition to have in the Mediawiki codebase in general.
-Robert Rohde