FWIW we had bot net 2 years ago that picked Special:Random_article as target.
Other than that there is not much similarity.
It caused a doubling of all traffic to Wiktionaries and ~5% extra traffic overall.
The bot net struck twice, once to Portuguese Wikitionary only.
A few months it spreaded bogus requests over all Wiktionaries. Then it disappeared.
Both peaks can still be seen at
http://stats.wikimedia.org/wiktionary/EN/TablesPageViewsMonthlyOriginal.htm
Erik
-----Original Message-----
From: analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org]
On Behalf Of Faidon Liambotis
Sent: Tuesday, November 26, 2013 18:44
To: Operations Engineers; Analytics team
Subject: [Analytics] Really strange malformed requests since yesterday
Hi,
We've been having spikes in our 5xx error logs since yesterday. There are definitely
multiple distinct causes for those, incl. esams network issues, random people trying to
DoS us, MediaWiki bugs that got backported yesterday etc.
One of the most peculiar cause of errors, though, are requests of this
form:
GET \\nki/Random_article HTTP/1.1
Host:
en.wikipedia.org
...
That's GET space backslash newline ki/Random_article ("Random_article"
being an example). This makes Varnish think the URL is "\" and
"ki/Random_article HTTP/1.1" some random malformed header and so it responds
with a 503 (and not a 400 -- that's a bug of its own).
The first occurence of such a request in our logs is 2013-11-25T12:03:45. Before that we
had 0 (zero) such requests in our logs, for all of November that I checked. Since then
and until now we've had 83.010 such requests (about 1/3 of our total 5xx).
I've verified those strange requests coming directly to our frontends -- they are not
passing through our SSL terminators or special proxies like Opera Mini. You can see e.g.
a sample filtered pcap at fenari.wikimedia.org:~faidon/malformed-GET-20131126.pcap (this
has private data, do not share). The packets' TCP checksum is obviously correct.
Those requests always are for
en.wikipedia.org articles, no other languages or projects.
They come from all user-agents & operating systems (so, probably not a malware). They
have all kind of Referers, including internal links. About 3/4 are coming from Google, but
this isn't irregular. Some of them have proper Cookies, including session tokens and
such (so, probably not just spoofed UAs).
The requests are 83.010, coming from 21.193 unique IPs in 121 different countries. The
distribution by country is the most interesting part; the top 5 of unique IPs reads:
18152 IN
271 PH
268 AE
228 MY
207 US
i.e. 85% comes from India -but not a particular ISP-, in a >24h period.
The distribution of hits per datacenter is:
78938 eqiad (incl. 72516 for India)
4072 esams
I've been on this for some time and I'm currently out of ideas.
At this point, the only theory that I have is some popular CPE device or, alternatively,
state surveillance device (e.g. BlueCoat), has gone haywire and is corrupting HTTP
requests (paranoia about state surveillance was one of the reasons I kept digging). Some
parts don't fit in either theory (traffic is distributed across both DCs &
multiple countries for state surveillance; requests are too targetted to enwiki for
CPEs).
Other thoughts? Am I missing something completely obvious?
Regards,
Faidon
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics