On Feb 12, 2004, at 04:36, Andre Engels wrote:
"Axel Boldt" <axelboldt(a)yahoo.com>
schrieb:
> Do we forbid certain spiders access to the site based on User-Agent? A
> user in a German forum reported recently that he couldn't access
> Wikipedia at all, always receiving a "Forbidden" message. It turned
> out
> that his webwasher proxy was to blame (an ad banner block). The proxy
> sends the User-Agent
>
> "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) WebWasher
> 3.0"
I can access pages on
de.wikipedia.org using the above user-agent
string, but don't have WebWasher to test with.
Webwasher
cannot be used to spider and download sites.
We forbid spiders based on User Agent, but WebWasher seems not to be
in the list. according to
http://www.wikipedia.org/robots.txt, the
following User-Agents are disallowed:
robots.txt isn't based on user-agent. It works on the honor system; if
the client doesn't obey robots.txt, there is no blocking caused by
robots.txt.
Here's the current user-agent block list:
# we don't like these user-agents:
acl stayaway browser ^Iala
acl stayaway browser ^Teleport
acl stayaway browser ^WebStripper
acl stayaway browser ^Snoopy
acl stayaway browser grub
acl stayaway browser ZyBorg
acl stayaway browser linko
acl stayaway browser FAST
acl stayaway browser HTTrack
acl stayaway browser Microsoft.URL.Control
acl stayaway browser ^Xenu
acl stayaway browser LARBIN
acl stayaway browser efp(a)gmx.net
acl stayaway browser larbin
acl stayaway browser ^LWP
acl stayaway browser libwww-perl
acl stayaway browser Python-urllib
acl stayaway browser ^WorQmada
acl stayaway browser ^TorQmada
acl stayaway browser ^k2spider
acl stayaway browser fetch.api.request
acl stayaway browser Zealbot
acl stayaway browser dloader
acl stayaway browser NaverRobot
acl stayaway browser Exalead
acl stayaway browser Fetch
acl stayaway browser Offline.Explorer
acl stayaway browser WWW-Mechanize
acl stayaway browser Downlad.Ninja
acl stayaway browser Web.Downloader
acl stayaway browser HTTrack
acl stayaway browser Sister.Site
acl stayaway browser WebReaper
Furthermore I know that any request without a User
Agent is refused.
There might be others, but someone who knows more about it than me
should check that.
That hasn't been true for a week or so.
-- brion vibber (brion @
pobox.com)