Re: [Wikitech-l] Squid status codes, please advice

12 Oct 2009


      On Sun, Oct 11, 2009 at 3:28 PM, Erik Zachte erikzachte@infodisiac.com wrote:
...
A
Any idea why there are so many TCP_DENIED/403, are these really failures ?
Certain types of requests are blocked at the Squid level for various
reasons.  For instance, try wgetting Wikipedia; you'll get a 403
because the default UA headers for such things are blocked.  (You're
supposed to use a custom UA header, preferably with contact info, to
make your script distinctive and easily blockable by itself if there's
a problem.)  Similarly, try something like this:
http://en.wikipedia.org/&;
I assume this kind of thing is what causes those responses.
On Sun, Oct 11, 2009 at 8:12 PM, Robert Rohde rarohde@gmail.com wrote:
...
However, a logical guess would
be if the Squid is configured to reject action=edit requests from
search engine spiders and similar non-human processes.  Since such
things are not easily incorporated into robots.txt, blocking at the
squid layer would be a good option for stopping such traffic from
hitting the main servers.  That would be my guess.  I suspect others
can give a more concrete answer.
Those things are all blocked in robots.txt:
User-agent: *
Disallow: /w/
That's part of why we use long URLs for everything but page views, so
that they can be neatly blocked from spiders.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Squid status codes, please advice