Combined replies to various posts below.
Steve Summit wrote:
A few of us -- though I fear an inconsequential minority -- are concerned that this is a destabilizing change, being made in a hurry, by a top-10 website, with consequences that aren't easy to predict and (apparently) haven't even been thought about.
Not entirely an inconsequential minority. Google complained by email that it broke a Google Translate feature, they got an IP-based exemption while they develop and deploy a fix.
In another post:
Domas wrote:
Hi Steve,
But why?
Because we need to identify malicious behavior.
You're trying to detect / guard against malicious behavior using *User-Agent*?? Good grief. Have fun with the whack-a-mole game, then.
Well yeah. We've had malicious traffic in the past that hasn't been easily filterable by request headers. The response was to create a list of the IP addresses causing the most traffic and to block them at Squid. Squid is reasonably well-optimised for this, it stores blocked IPs and ranges in a tree, giving you lookup in O(log N) time in the number of blocked IPs.
That would have been more work, and I appreciate that the sysadmin team is small and needs to allocate their time carefully. It's not my job to tell them how to do that and I wasn't offerring to help.
But note that the action taken wasn't to block all list=search API queries that have a blank user agent header. The overly broad response should give you a hint that there was another motive at work.
I think they want to make their work easier in the future. Although it doesn't help much with malicious traffic, requiring a User-Agent header does help to distinguish different sources of non-malicious but excessively expensive traffic.
In another post:
When the new code blocks requests with missing User Agent strings (which is, oddly, not all of the time), it is with a 403 Forbidden response and the very simple message
Please provide a User-Agent header
(No <html> tags, no nothing.)
Be glad it doesn't just say "sigh".
if( $wgDBname == 'kuwiki' && preg_match( '/[[Image:Flag_of_Turkey.svg]]/', @$_REQUEST['wpTextbox1'] ) ) { die("Sigh.\n"); }
Seriously.
http://ku.wikipedia.org/w/index.php?wpTextbox1=%5B%5BImage:Flag_of_Turkey.sv...]]
Domas Mituzas wrote:
Actually we had User-Agent header requirement for ages, it just failed to do what it had to do for a while. Consider this to be a bugfix.
For the record, I didn't like the idea the first time around either.
-- Tim Starling