On Tue, Nov 26, 2013 at 02:05:53PM -0500, Dan Andreescu wrote:
Other thoughts? Am I missing something completely obvious?
Could this be just a chain email that started in India, and has a malformed URL with a newline in it, like this:
http://google-junk-that-is-too-big-to-fit-on-one-line-for-the-author. ..?...target=http://en.wikipedia.org/wi ki/Random_article
No. A browser would never send a malformed HTTP request, no matter what you put on the URL bar. It could have been a browser bug, but we see this with multiple browsers, so that's not it.
So, to reiterate: - Malformed HTTP from multiple UA strings -> not a browser bug - TCP checksum right -> not a network-level corruption (routers/switches/LVS) - Caught with tcpdump as it enters the server -> not a Varnish bug, unlikely to be a kernel bug - Happens on all Varnish caches even with different configuration -> not a kernel or firmware (e.g. NIC) heisenbug - Only happens for Host: en.wikipedia.org -> suggests something smart enough to understand HTTP headers.
So, it looks like it's something that sits between the browser and our caches, does deep packet inspection, understands HTTP and has a bug somewhere that corrupts it. It could be a malware[1], a buggy CPE, a content inspection/filtering appliance that updated itself, or a state surveillance system malfunction.
The impact is not huge but it's not so small either. It's just very... puzzling :)
Regards, Faidon
1: Malware seemed initially unlikely due to the diversity of User-Agent strings, however further analysis showed all the non-Windows ones -even iPhone & Android- being a tiny percentage of requests so it's on the top of my list now.