On Tue, Nov 26, 2013 at 07:44:02PM +0200, Faidon Liambotis wrote:
That's GET space backslash newline
ki/Random_article ("Random_article"
being an example). This makes Varnish think the URL is "\" and
"ki/Random_article HTTP/1.1" some random malformed header and so it
responds with a 503 (and not a 400 -- that's a bug of its own).
I debugged that last part, i.e. why were we emitting a 503 instead of a
400.
So, Varnish parses the request and makes the exact equivalent one to
Apaches -- it doesn't grok on the input but is, amazingly, perfectly
happy with it. In turn, Apache sees a URL that doesn't start with / or
is * and correctly returns a HTTP_BAD_REQUEST (400) internally
(server/core.c, line 3560). However, since "HTTP/X.X" is missing
from the first line, it interprets it as an HTTP/0.9 request and emits a
Bad Request HTML directly, conversing in perfectly valid HTTP/0.9(!).
Only HTTP/0.9 didn't have response headers at all, so that protocol
conversation is:
-> GET /
<- <!DOCTYPE html>
<- <html lang=en>
<- [...]
Varnish, a piece of software first released in 2006, does not support
HTTP/0.9. Therefore, it gets confused by the lack of response headers
and interprets this response as a backend error, ultimately throwing a
503.
I checked the Apache source; there is unfortunately no way to increase
the minimum supported protocol version to 1.0. I checked the Varnish
source as well; the protocol detection seems like a hardcoded sscanf(),
but it shouldn't be too difficult to fix. Ideally, we should fix Varnish
to drop HTTP/0.9 requests as malformed altogether and obviously not
adding backend HTTP/0.9 support. Any takers? :)
I also tested adding a
if (req.url == "\") {
error 400 "Bad request"
}
to our VCL. It works as intended, but since these wouldn't be logged 1:1
in our log infrastructure, I opted into not adding this snippet.
The first occurence of such a request in our logs is
2013-11-25T12:03:45. Before that we had 0 (zero) such requests in
our logs, for all of November that I checked. Since then and until
now we've had 83.010 such requests (about 1/3 of our total 5xx).
It seems this is getting gradually over:
5xx.tsv.log-20131125.gz:0
5xx.tsv.log-20131126.gz:26837
5xx.tsv.log-20131127.gz:77202
5xx.tsv.log-20131128.gz:70318
5xx.tsv.log-20131129.gz:13118
5xx.tsv.log-20131130.gz:890
Whatever the cause for this was, breaking access to Wikipedia was
probably noticed and fixed by whoever introduced that bug in the first
place.
Regards,
Faidon