On Wed, Aug 18, 2010 at 11:37 PM, Tim Starling <tstarling(a)wikimedia.org> wrote:
<snip>
The idea I came up with is to hash the output of
str_repeat(). This
increases the number of rounds of the compression function, while
avoiding tight loops in PHP code.
<snip>
My proposed hash function is a B-type MD5 salted hash,
which is then
further hashed with a configurable number of invocations of WHIRLPOOL,
with a 256-bit substring taken from a MediaWiki-specific location. The
input to each WHIRLPOOL operation is expanded by a factor of 100 with
str_repeat().
<snip>
Let me preface my comment by saying that I haven't studied WHIRLPOOL,
and the following may not apply to it at all.
However, it is known that some block cypher based hashes behave poorly
when fed repeated copies of the same block. In the worst cases the
hash space is substantially truncated from its full size (which
probably is not the case for any serious cryptographic hash function).
Under less severe cases, cryptanalysis can find a new block cipher W'
such that N applications of block cipher W is the same as one
application of W'. If WHIRLPOOL is vulnerable to that kind of attack
then it would negate the effect of using str_repeat in your code.
Like I said, I don't know if either concern applies to WHIRLPOOL.
However, these concerns only occur because the 256-bit string you are
repeating is a fundamental divisor of the 512-bit block size used by
WHIRLPOOL. So, it is trivial to avoid the whole issue simply by using
a different repeated block size. For example 97 copies of a 33 byte
string should have essentially the same computational performance,
while making any associated cryptanalysis threat impossible (or at
least less likely).
My only other comment is something you presumably already know. Your
proposal is still nothing but an arms race. It makes hashes harder to
crack by making the hash function itself much more computationally
expensive. However, you'd still have to periodically boost the rep
rate with the intention of staying far in front of the hackers.
As a complementary approach it would be nice if there was something in
Mediawiki to aid in the selection of strong passwords. Regardless of
hash function, it will still take about two billion times longer to
find one 10 character password in [A-Za-z0-9] as it does to find a 6
character password in [a-z]. Even if password strength testing
algorithms were disabled on Wikipedia sites, it would still be a nice
addition to have in the Mediawiki codebase in general.
-Robert Rohde