On Tue, Nov 26, 2013 at 02:05:53PM -0500, Dan Andreescu wrote:
Other thoughts? Am I missing something completely obvious?
Could this be just a chain email that started in India, and has a malformed
URL with a newline in it, like this:
http://google-junk-that-is-too-big-to-fit-on-one-line-for-the-author.
..?...target=http://en.wikipedia.org/wi
ki/Random_article
No. A browser would never send a malformed HTTP request, no matter what
you put on the URL bar. It could have been a browser bug, but we see
this with multiple browsers, so that's not it.
So, to reiterate:
- Malformed HTTP from multiple UA strings -> not a browser bug
- TCP checksum right -> not a network-level corruption
(routers/switches/LVS)
- Caught with tcpdump as it enters the server -> not a Varnish bug,
unlikely to be a kernel bug
- Happens on all Varnish caches even with different configuration -> not
a kernel or firmware (e.g. NIC) heisenbug
- Only happens for Host:
en.wikipedia.org -> suggests something smart
enough to understand HTTP headers.
So, it looks like it's something that sits between the browser and our
caches, does deep packet inspection, understands HTTP and has a bug
somewhere that corrupts it. It could be a malware[1], a buggy CPE, a
content inspection/filtering appliance that updated itself, or a state
surveillance system malfunction.
The impact is not huge but it's not so small either. It's just very...
puzzling :)
Regards,
Faidon
1: Malware seemed initially unlikely due to the diversity of User-Agent
strings, however further analysis showed all the non-Windows ones -even
iPhone & Android- being a tiny percentage of requests so it's on the top
of my list now.