In the interest of proactive discussion (rather than griping), why
don't we discuss better ways to manage bad bots, etc.
I don't know what internal tools currently exist but it seems to me
like there ought to be better opportunities for traffic monitoring
than UA blocks. For example, we have the Squid logs that are used to
make page hit counts. My recollection is that the raw form of those
logs include IP addresses (which are of course removed before
aggregate data is provided to the public). If the IPs are logged, it
should be straightforward to use hits per hour per IP in order to
identify the top traffic generators. Someone on the inside could then
inspect the biggest traffic generators and create white lists and
black lists. Maybe something like this is already done.
I assume most of the legitimate sources of large traffic loads are
generally pretty stable, so it wouldn't be hard to create automatic
monitoring that provided an alert when a new IP entered the list of
the top 100 traffic generators (for example).
I would generally assume that directly detecting which requestors are
responsible for the highest loads would accomplish more than using a
meta characteristic like UA strings to try and find problems. (Not
that IP monitoring alone is sufficient either.)
-Robert Rohde