On Thu, Aug 19, 2010 at 2:37 AM, Tim Starling tstarling@wikimedia.org wrote:
The problem with the standard key strengthening algorithms, e.g. PBKDF1, is that they are not efficient in PHP. We don't want a C implementation of our scheme to be orders of magnitude faster than our PHP implementation, because that would allow brute-forcing to be more feasible than is necessary.
The idea I came up with is to hash the output of str_repeat(). This increases the number of rounds of the compression function, while avoiding tight loops in PHP code.
Seems reasonable.
PHP's hash extension has been available by default since PHP 5.1.2, and we can always fall back to using B-type hashes if it's explicitly disabled. The WHIRLPOOL hash is supported. It has no patent or copyright restrictions so it's not going to be yanked out of Debian or PHP for legal reasons. It has a 512-bit block size, the largest of any hash function available in PHP, and its security goals state that it can be truncated without compromising its properties.
My proposed hash function is a B-type MD5 salted hash, which is then further hashed with a configurable number of invocations of WHIRLPOOL, with a 256-bit substring taken from a MediaWiki-specific location. The input to each WHIRLPOOL operation is expanded by a factor of 100 with str_repeat().
The number of WHIRLPOOL iterations is specified in the output string as a base-2 logarithm (whimsically padded out to 3 decimal digits to allow for future universe-sized computers). This number can be upgraded by taking the hash part of the output and applying more rounds to it. A count of 2^7 = 128 gives a time of 55ms on my laptop, and 12ms on one of our servers, so a reasonable default is probably 2^6 or 2^7.
That seems reasonable. It could probably be done a lot faster on GPUs, I guess.
On Thu, Aug 19, 2010 at 4:45 AM, Daniel Kinzler daniel@brightbyte.de wrote:
I don't know that much about the mathematical details of hashing, but i'd like to drop a pointer to an article if found interesting in this context:
"Stop using unsafe keyed hashes, use HMAC" http://rdist.root.org/2009/10/29/stop-using-unsafe-keyed-hashes-use-hmac/
So, how does your proposal relate to HMAC?
As Tim said, it doesn't -- we aren't using keyed hashes, and we're only concerned about preimage attacks (not collision or second-preimage). Preimage attacks imply second-preimage attacks, and second-preimage attacks imply collision attacks. Thus something that's secure against collision is also secure against preimage and second-preimage, but a function with collision and second-preimage attacks might have no preimage attacks. For instance, MD5 has tons of trivial collision attacks against it, but no preimage attacks (not sure about second-preimage attacks offhand). Whirlpool has no known collision attacks, and thus no known preimage or second-preimage attacks.
On Thu, Aug 19, 2010 at 5:02 AM, Robert Rohde rarohde@gmail.com wrote:
Let me preface my comment by saying that I haven't studied WHIRLPOOL, and the following may not apply to it at all.
However, it is known that some block cypher based hashes behave poorly when fed repeated copies of the same block. In the worst cases the hash space is substantially truncated from its full size (which probably is not the case for any serious cryptographic hash function). Under less severe cases, cryptanalysis can find a new block cipher W' such that N applications of block cipher W is the same as one application of W'. If WHIRLPOOL is vulnerable to that kind of attack then it would negate the effect of using str_repeat in your code.
In principle, we could evade any concern like this by just using a provably secure hash function. The usual reason not to use those is that they're slow, but that's an advantage in our case. For instance (from an exercise in my cryptography course), let p be a prime, q a prime dividing p - 1, and let G be the subgroup of Z_p^* of order q. Let g be a randomly chosen generator for G, and let a_1, ..., a_k be randomly chosen elements of G. If x, y_1, ..., y_k are integers in the range 1 to q, define H(x, y_1, ..., y_k) = g^x a_1^y_1 ... a_k^y_k. Then under the discrete logarithm assumption for G, it's easy to prove that H is collision-resistant. You can make similar functions based on other hard problems -- http://en.wikipedia.org/wiki/Provably_secure_cryptographic_hash_function gives another example using factorization, whose correctness is probably more obvious.
There are two problems with using such a function. One is that there's probably no readily available implementation, so we'd have to write our own, which might be vulnerable to side-channel attacks. Another is that we're more worried about brute-forcing than about the algorithm being broken, so we'd have to write it in C to avoid giving one or two orders of magnitude advantage to the attacker, and then shared hosts can't use it. Perhaps the way to solve both problems is to submit the code for inclusion in future versions of the PHP hash module, after at least casual review by some cryptographers. This would require significant extra work, of course.
Another thing to consider is if we could pick a function that's particularly inconvenient to execute on GPUs. Those are a great way for crackers to easily outdo any CPU implementation.
For a basic implementation, anyway, Tim's proposal sounds like it's much better than what we have now. Fancy stuff like provably correct hash functions is probably overkill.
On Thu, Aug 19, 2010 at 5:02 AM, Robert Rohde rarohde@gmail.com wrote:
As a complementary approach it would be nice if there was something in Mediawiki to aid in the selection of strong passwords. Regardless of hash function, it will still take about two billion times longer to find one 10 character password in [A-Za-z0-9] as it does to find a 6 character password in [a-z]. Even if password strength testing algorithms were disabled on Wikipedia sites, it would still be a nice addition to have in the Mediawiki codebase in general.
I agree we should have a good system available for those who want it, and it should probably be enabled by default for sysops. However, as I've said elsewhere, these strength testing things are counterproductive for unprivileged users on open wikis. For such users, a compromised account is approximately as bad as just forgetting your password -- the result in either case is you just lose access to the account. If it's compromised, it might be somewhat worse (because the attacker might prevent an e-mail reset), but on the other hand, forgetting your password is at least a couple orders of magnitude more likely. As such, to minimize harm to users, we should encourage them to use easy-to-remember passwords, not strong passwords.
On Thu, Aug 19, 2010 at 10:12 AM, Jonathan Leybovich jleybov@yahoo.com wrote:
What about using public key cryptography? Generate a key-pair and use the "public" key to produce your password hashes. Store the private key offline in an underground vault just in case someday you'll need to recover the original passwords in order to rehash them. Needless to say the key-pair must be entirely for internal use and not already part of some PKI system (i.e. the basis for one of Wikimedia's signed SSL certificates).
If you just delete the private key, this is the same as any old hash function, but with various additional deficiencies. For instance, public-key encryption is not designed to hide length information, while hash functions are -- although due to padding, this might not be a big deal. More seriously, public-key encryption schemes are invariably probabilistic, which makes them useless for our purposes. We'd have to adapt the scheme to remove any randomization. And there might be other problems. There's really no reason to go to all this trouble -- people have designed and studied dedicated hash functions for a reason.
On Thu, Aug 19, 2010 at 10:50 AM, Ryan Lane rlane32@gmail.com wrote:
We could do a less secure, but more-secure-than-passwords alternative, which is to use email or SMS as a one time password device. SMS is obviously more secure than email, but would require us to ask people for their phone numbers.
SMS has loads of vulnerabilities:
http://en.wikipedia.org/wiki/SMS#Vulnerabilities
I don't see anyone signing up to get an e-mail or text message every time they want to log in, either. At best, this pushes the authentication problem back to their e-mail or SMS provider. In that case, why not just use OpenID?
We could also make a PKI infrastructure, and allow certificate login, which is obviously safer than passwords.
Not if the password is not stored on the computer and the private key is.
The real problem with any system stronger than passwords, is that it requires a level of complexity that would be difficult for us, and either annoying or very confusing for users.
Yes, so let's not worry about it, shall we? We aren't the NSA here.
On Thu, Aug 19, 2010 at 2:18 PM, Jonathan Leybovich jleybov@yahoo.com wrote:
In that case you could always discard the private portion of the key-pair to produce a strictly "one-way" function. And at least with this scheme you always do have the option of moving to 'C' regardless of whether it can accept the end-products of B as inputs. Plus I would wager that asymmetric ciphers will stand up to attacks far longer than most hashing functions.
As I noted above, there are hash functions whose security is provable based on the exact same assumptions used to prove security of various popular asymmetric encryption schemes. As I also noted above, there are problems with naively trying to use public-key encryption instead of hash functions. It makes more sense to just use known-secure hash functions directly instead of trying to twist public-key encryption to our needs, if we're that worried about Whirlpool (et al.) being broken anytime soon.
For what it's worth, even ancient and thoroughly-broken hash functions like MD4 don't have readily-usable preimage attacks. (MD4 was published in 1990, broken in 1995, and only in 2008 was a preimage attack published -- which still requires 2^102 evaluations, instead of 2^128.) We don't have to worry much about the hash functions being broken.