On Thu, Aug 19, 2010 at 2:37 AM, Tim Starling <tstarling(a)wikimedia.org> wrote:
The problem with the standard key strengthening
algorithms, e.g.
PBKDF1, is that they are not efficient in PHP. We don't want a C
implementation of our scheme to be orders of magnitude faster than our
PHP implementation, because that would allow brute-forcing to be more
feasible than is necessary.
The idea I came up with is to hash the output of str_repeat(). This
increases the number of rounds of the compression function, while
avoiding tight loops in PHP code.
Seems reasonable.
PHP's hash extension has been available by default
since PHP 5.1.2,
and we can always fall back to using B-type hashes if it's explicitly
disabled. The WHIRLPOOL hash is supported. It has no patent or
copyright restrictions so it's not going to be yanked out of Debian or
PHP for legal reasons. It has a 512-bit block size, the largest of any
hash function available in PHP, and its security goals state that it
can be truncated without compromising its properties.
My proposed hash function is a B-type MD5 salted hash, which is then
further hashed with a configurable number of invocations of WHIRLPOOL,
with a 256-bit substring taken from a MediaWiki-specific location. The
input to each WHIRLPOOL operation is expanded by a factor of 100 with
str_repeat().
The number of WHIRLPOOL iterations is specified in the output string
as a base-2 logarithm (whimsically padded out to 3 decimal digits to
allow for future universe-sized computers). This number can be
upgraded by taking the hash part of the output and applying more
rounds to it. A count of 2^7 = 128 gives a time of 55ms on my laptop,
and 12ms on one of our servers, so a reasonable default is probably
2^6 or 2^7.
That seems reasonable. It could probably be done a lot faster on GPUs, I guess.
On Thu, Aug 19, 2010 at 4:45 AM, Daniel Kinzler <daniel(a)brightbyte.de> wrote:
I don't know that much about the mathematical
details of hashing, but i'd like
to drop a pointer to an article if found interesting in this context:
"Stop using unsafe keyed hashes, use HMAC"
http://rdist.root.org/2009/10/29/stop-using-unsafe-keyed-hashes-use-hmac/
So, how does your proposal relate to HMAC?
As Tim said, it doesn't -- we aren't using keyed hashes, and we're
only concerned about preimage attacks (not collision or
second-preimage). Preimage attacks imply second-preimage attacks, and
second-preimage attacks imply collision attacks. Thus something
that's secure against collision is also secure against preimage and
second-preimage, but a function with collision and second-preimage
attacks might have no preimage attacks. For instance, MD5 has tons of
trivial collision attacks against it, but no preimage attacks (not
sure about second-preimage attacks offhand). Whirlpool has no known
collision attacks, and thus no known preimage or second-preimage
attacks.
On Thu, Aug 19, 2010 at 5:02 AM, Robert Rohde <rarohde(a)gmail.com> wrote:
Let me preface my comment by saying that I haven't
studied WHIRLPOOL,
and the following may not apply to it at all.
However, it is known that some block cypher based hashes behave poorly
when fed repeated copies of the same block. In the worst cases the
hash space is substantially truncated from its full size (which
probably is not the case for any serious cryptographic hash function).
Under less severe cases, cryptanalysis can find a new block cipher W'
such that N applications of block cipher W is the same as one
application of W'. If WHIRLPOOL is vulnerable to that kind of attack
then it would negate the effect of using str_repeat in your code.
In principle, we could evade any concern like this by just using a
provably secure hash function. The usual reason not to use those is
that they're slow, but that's an advantage in our case. For instance
(from an exercise in my cryptography course), let p be a prime, q a
prime dividing p - 1, and let G be the subgroup of Z_p^* of order q.
Let g be a randomly chosen generator for G, and let a_1, ..., a_k be
randomly chosen elements of G. If x, y_1, ..., y_k are integers in
the range 1 to q, define H(x, y_1, ..., y_k) = g^x a_1^y_1 ...
a_k^y_k. Then under the discrete logarithm assumption for G, it's
easy to prove that H is collision-resistant. You can make similar
functions based on other hard problems --
http://en.wikipedia.org/wiki/Provably_secure_cryptographic_hash_function
gives another example using factorization, whose correctness is
probably more obvious.
There are two problems with using such a function. One is that
there's probably no readily available implementation, so we'd have to
write our own, which might be vulnerable to side-channel attacks.
Another is that we're more worried about brute-forcing than about the
algorithm being broken, so we'd have to write it in C to avoid giving
one or two orders of magnitude advantage to the attacker, and then
shared hosts can't use it. Perhaps the way to solve both problems is
to submit the code for inclusion in future versions of the PHP hash
module, after at least casual review by some cryptographers. This
would require significant extra work, of course.
Another thing to consider is if we could pick a function that's
particularly inconvenient to execute on GPUs. Those are a great way
for crackers to easily outdo any CPU implementation.
For a basic implementation, anyway, Tim's proposal sounds like it's
much better than what we have now. Fancy stuff like provably correct
hash functions is probably overkill.
On Thu, Aug 19, 2010 at 5:02 AM, Robert Rohde <rarohde(a)gmail.com> wrote:
As a complementary approach it would be nice if there
was something in
Mediawiki to aid in the selection of strong passwords. Regardless of
hash function, it will still take about two billion times longer to
find one 10 character password in [A-Za-z0-9] as it does to find a 6
character password in [a-z]. Even if password strength testing
algorithms were disabled on Wikipedia sites, it would still be a nice
addition to have in the Mediawiki codebase in general.
I agree we should have a good system available for those who want it,
and it should probably be enabled by default for sysops. However, as
I've said elsewhere, these strength testing things are
counterproductive for unprivileged users on open wikis. For such
users, a compromised account is approximately as bad as just
forgetting your password -- the result in either case is you just lose
access to the account. If it's compromised, it might be somewhat
worse (because the attacker might prevent an e-mail reset), but on the
other hand, forgetting your password is at least a couple orders of
magnitude more likely. As such, to minimize harm to users, we should
encourage them to use easy-to-remember passwords, not strong
passwords.
On Thu, Aug 19, 2010 at 10:12 AM, Jonathan Leybovich <jleybov(a)yahoo.com> wrote:
What about using public key cryptography? Generate a
key-pair and use the "public" key to produce your password hashes. Store the
private key offline in an underground vault just in case someday you'll need to
recover the original passwords in order to rehash them. Needless to say the key-pair must
be entirely for internal use and not already part of some PKI system (i.e. the basis for
one of Wikimedia's signed SSL certificates).
If you just delete the private key, this is the same as any old hash
function, but with various additional deficiencies. For instance,
public-key encryption is not designed to hide length information,
while hash functions are -- although due to padding, this might not be
a big deal. More seriously, public-key encryption schemes are
invariably probabilistic, which makes them useless for our purposes.
We'd have to adapt the scheme to remove any randomization. And there
might be other problems. There's really no reason to go to all this
trouble -- people have designed and studied dedicated hash functions
for a reason.
On Thu, Aug 19, 2010 at 10:50 AM, Ryan Lane <rlane32(a)gmail.com> wrote:
We could do a less secure, but
more-secure-than-passwords alternative,
which is to use email or SMS as a one time password device. SMS is
obviously more secure than email, but would require us to ask people
for their phone numbers.
SMS has loads of vulnerabilities:
http://en.wikipedia.org/wiki/SMS#Vulnerabilities
I don't see anyone signing up to get an e-mail or text message every
time they want to log in, either. At best, this pushes the
authentication problem back to their e-mail or SMS provider. In that
case, why not just use OpenID?
We could also make a PKI infrastructure, and
allow certificate login, which is obviously safer than passwords.
Not if the password is not stored on the computer and the private key is.
The real problem with any system stronger than
passwords, is that it
requires a level of complexity that would be difficult for us, and
either annoying or very confusing for users.
Yes, so let's not worry about it, shall we? We aren't the NSA here.
On Thu, Aug 19, 2010 at 2:18 PM, Jonathan Leybovich <jleybov(a)yahoo.com> wrote:
In that case you could always discard the private
portion of the key-pair to
produce a strictly "one-way" function. And at least with this scheme you
always
do have the option
of moving to 'C' regardless of whether it can accept the end-products of B as
inputs. Plus I would wager that asymmetric ciphers will stand up to attacks far
longer than most hashing functions.
As I noted above, there are hash functions whose security is provable
based on the exact same assumptions used to prove security of various
popular asymmetric encryption schemes. As I also noted above, there
are problems with naively trying to use public-key encryption instead
of hash functions. It makes more sense to just use known-secure hash
functions directly instead of trying to twist public-key encryption to
our needs, if we're that worried about Whirlpool (et al.) being broken
anytime soon.
For what it's worth, even ancient and thoroughly-broken hash functions
like MD4 don't have readily-usable preimage attacks. (MD4 was
published in 1990, broken in 1995, and only in 2008 was a preimage
attack published -- which still requires 2^102 evaluations, instead of
2^128.) We don't have to worry much about the hash functions being
broken.