Hi,
whenever a request is sent to Wikipedia with the following header, the result is Error 403:
User-Agent: W3C-checklink/4.1 [4.14] libwww-perl/5.803
The resulting document has a XHTML 1.0 Strict definition, but it is not valid XHTML 1.0 Strict - see http://topjaklont.student.utwente.nl/invalid.html and try to validate it.
Why is Error 403 served at all? As for example www.sp.nl proves, W3C-checklink can obey robots.txt. It also sleeps one second between each request, so it does a fair job in throttling. What's the problem then?
regards, Gerrit Holl.
Gerrit Holl wrote:
whenever a request is sent to Wikipedia with the following header, the result is Error 403:
User-Agent: W3C-checklink/4.1 [4.14] libwww-perl/5.803
[snip]
Why is Error 403 served at all?
libwww-perl is blocked, as we have had a lot of trouble in the past with poorly behaved spiders.
-- brion vibber (brion @ pobox.com)
wikipedia-l@lists.wikimedia.org