Message: 7 Date: Wed, 17 Feb 2010 13:47:47 +1100 From: John Vandenberg jayvdb@gmail.com Subject: Re: [Wikitech-l] User-Agent: To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: deea21831002161847o2f64f736w37e5a448a7642a5e@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
On Wed, Feb 17, 2010 at 1:00 PM, Anthony wikimail@inbox.org wrote:
On Wed, Feb 17, 2010 at 11:57 AM, Domas Mituzas midom.lists@gmail.com wrote:
Probably everything looks easier from your armchair. I'd love to have that view! :)
Then stop volunteering.
Did you miss the point?
The graphs provided in this thread clearly show that the solution had a positive & desired effect.
A few negative side-effects have been put forward, such as preventing browsing without a UA, but Domas has also indicated that other tech team members can overturn the change if they don't like it.
-- John Vandenberg
Hi,
Don't forget some normal traffic was blocked from this unannounced change, ie. Google's translate service? How much of the traffic reduction was from services like this? Some of the cited reduced traffic proving the strategies success is coming from valid services. Be careful or soon you will be saying: "you are either with wikimedia or with the terrorists"? :)
cheers, Jamie
On Wed, Feb 17, 2010 at 2:51 PM, Jamie Morken jmorken@shaw.ca wrote:
Don't forget some normal traffic was blocked from this unannounced change, ie. Google's translate service? How much of the traffic reduction was from services like this? Some of the cited reduced traffic proving the strategies success is coming from valid services. Be careful or soon you will be saying: "you are either with wikimedia or with the terrorists"? :)
With this solution, it is now possible to determine how much of the traffic was from valid services. i.e. google translate and other useful services will identify themselves, and the traffic graph will rise accordingly.
Note that I am not in favour of the solution, as it breaks backwards compatibility with tools. Unannounced breakages are especially nasty. e.g. it could have been quite easy to shoot an email to Google asking that they add a user-agent for the unidentified traffic.
I am even less in favour of Domas retiring to an armchair, and think that anyone suggesting that is deluding themselves about Wikimedia's need of Domas, and Domas' reason for volunteering.
-- John Vandenberg
On Tue, Feb 16, 2010 at 11:18 PM, John Vandenberg jayvdb@gmail.com wrote:
With this solution, it is now possible to determine how much of the traffic was from valid services. i.e. google translate and other useful services will identify themselves
And what separates google translate from other useful services which hotload Wikipedia (other than the $2 million, which is not to say that $2 million is a bad reason to separate it, but let's at least be honest if that's the reason)?
I am even less in favour of Domas retiring to an armchair, and think that anyone suggesting that is deluding themselves about Wikimedia's need of Domas, and Domas' reason for volunteering.
Well, I never did say I was in favor of it. I merely pointed out his hypocrisy in claiming that he would love to have it be so.
Anthony wrote:
Probably everything looks easier from your armchair. I'd love to have that view! :)
Then stop volunteering.
John Vandenberg wrote:
I am even less in favour of Domas retiring to an armchair, and think that anyone suggesting that is deluding themselves about Wikimedia's need of Domas, and Domas' reason for volunteering.
I think it's common knowledge among people who have been reading these lists for a long time, that Anthony has a serious deficit in his sarcasm detection department, and often gives inappropriate responses to sarcastic comments.
This puts him on a collision course with Domas, whose posts are sarcastic about 80% of the time.
You should try not to make a big deal about it every time it happens.
-- Tim Starling
On Tue, Feb 16, 2010 at 11:32 PM, Tim Starling tstarling@wikimedia.orgwrote:
I think it's common knowledge among people who have been reading these lists for a long time, that Anthony has a serious deficit in his sarcasm detection department, and often gives inappropriate responses to sarcastic comments.
Hah, I need to put that in my tagline or something.
Well, thanks for the defense, I guess. Or were you being sarcastic?
In the interest of proactive discussion (rather than griping), why don't we discuss better ways to manage bad bots, etc.
I don't know what internal tools currently exist but it seems to me like there ought to be better opportunities for traffic monitoring than UA blocks. For example, we have the Squid logs that are used to make page hit counts. My recollection is that the raw form of those logs include IP addresses (which are of course removed before aggregate data is provided to the public). If the IPs are logged, it should be straightforward to use hits per hour per IP in order to identify the top traffic generators. Someone on the inside could then inspect the biggest traffic generators and create white lists and black lists. Maybe something like this is already done.
I assume most of the legitimate sources of large traffic loads are generally pretty stable, so it wouldn't be hard to create automatic monitoring that provided an alert when a new IP entered the list of the top 100 traffic generators (for example).
I would generally assume that directly detecting which requestors are responsible for the highest loads would accomplish more than using a meta characteristic like UA strings to try and find problems. (Not that IP monitoring alone is sufficient either.)
-Robert Rohde
that would work if it was single ips that where the main cause. what happens is there are wide distributions of IPs which lead to either blocking hole ISP ranges or being unable to easily identify DDoS like behavior.
John
On Tue, Feb 16, 2010 at 11:22 PM, Robert Rohde rarohde@gmail.com wrote:
In the interest of proactive discussion (rather than griping), why don't we discuss better ways to manage bad bots, etc.
I don't know what internal tools currently exist but it seems to me like there ought to be better opportunities for traffic monitoring than UA blocks. For example, we have the Squid logs that are used to make page hit counts. My recollection is that the raw form of those logs include IP addresses (which are of course removed before aggregate data is provided to the public). If the IPs are logged, it should be straightforward to use hits per hour per IP in order to identify the top traffic generators. Someone on the inside could then inspect the biggest traffic generators and create white lists and black lists. Maybe something like this is already done.
I assume most of the legitimate sources of large traffic loads are generally pretty stable, so it wouldn't be hard to create automatic monitoring that provided an alert when a new IP entered the list of the top 100 traffic generators (for example).
I would generally assume that directly detecting which requestors are responsible for the highest loads would accomplish more than using a meta characteristic like UA strings to try and find problems. (Not that IP monitoring alone is sufficient either.)
-Robert Rohde
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org