"Axel Boldt" <axelboldt(a)yahoo.com> schrieb:
Do we forbid certain spiders access to the site based
on User-Agent? A
user in a German forum reported recently that he couldn't access
Wikipedia at all, always receiving a "Forbidden" message. It turned out
that his webwasher proxy was to blame (an ad banner block). The proxy
sends the User-Agent
"Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) WebWasher 3.0"
Webwasher cannot be used to spider and download sites.
We forbid spiders based on User Agent, but WebWasher seems not to be
in the list. according to
http://www.wikipedia.org/robots.txt, the
following User-Agents are disallowed:
UbiCrawler
DOC
Zao
sitecheck.internetseer.com
Zealbot
MSIECrawler
SiteSnagger
WebStripper
WebCopier
Fetch
Ofline Explorer
Teleport
TeleportPro
WebZIP
linko
HTTrack
Microsoft.URL.Control
Xenu
larbin
libwww
ZyBORG
Download Ninja
wget
grub-client
k2spider
NPBot
HTTrack
Furthermore I know that any request without a User Agent is refused.
There might be others, but someone who knows more about it than me
should check that.
Andre Engels