Combined replies to various posts below.
Steve Summit wrote:
A few of us -- though I
fear an inconsequential minority -- are concerned that this is a
destabilizing change, being made in a hurry, by a top-10 website,
with consequences that aren't easy to predict and (apparently)
haven't even been thought about.
Not entirely an inconsequential minority. Google complained by email
that it broke a Google Translate feature, they got an IP-based
exemption while they develop and deploy a fix.
In another post:
Domas wrote:
> Hi Steve,
> > > But why?
>
> Because we need to identify malicious behavior.
You're trying to detect / guard against malicious behavior using
*User-Agent*?? Good grief. Have fun with the whack-a-mole game, then.
Well yeah. We've had malicious traffic in the past that hasn't been
easily filterable by request headers. The response was to create a
list of the IP addresses causing the most traffic and to block them at
Squid. Squid is reasonably well-optimised for this, it stores blocked
IPs and ranges in a tree, giving you lookup in O(log N) time in the
number of blocked IPs.
That would have been more work, and I appreciate that the sysadmin
team is small and needs to allocate their time carefully. It's not my
job to tell them how to do that and I wasn't offerring to help.
But note that the action taken wasn't to block all list=search API
queries that have a blank user agent header. The overly broad response
should give you a hint that there was another motive at work.
I think they want to make their work easier in the future. Although it
doesn't help much with malicious traffic, requiring a User-Agent
header does help to distinguish different sources of non-malicious but
excessively expensive traffic.
In another post:
When the new code blocks requests with missing User
Agent strings
(which is, oddly, not all of the time), it is with a 403
Forbidden response and the very simple message
Please provide a User-Agent header
(No <html> tags, no nothing.)
Be glad it doesn't just say "sigh".
if( $wgDBname == 'kuwiki'
&& preg_match( '/\[\[Image:Flag_of_Turkey.svg\]\]/',
@$_REQUEST['wpTextbox1'] ) )
{
die("Sigh.\n");
}
Seriously.
http://ku.wikipedia.org/w/index.php?wpTextbox1=[[Image:Flag_of_Turkey.svg]]
Domas Mituzas wrote:
Actually we had User-Agent header requirement for
ages, it just
failed to do what it had to do for a while. Consider this to be a
bugfix.
For the record, I didn't like the idea the first time around either.
-- Tim Starling