There appears to be a problem on this page: http://www.wikipedia.org/wiki/User:TakuyaMurata When we click on your link to: http://kuwahala.free-city.net/nihonshi/nindex.html we get the error: 404 File not found
We last examined your page on Mon Sep 29, 2003 at 01:43:19 PM EDT. If it has not been updated since then, the link is most likely broken.
We discovered this error during our normal course of website content checking for one of our search engine clients.
If you would like to receive notifications like this in the future if we find pages unavailable, click here: http://scclick.internetseer.com/sitecheck/clickthrough.jsp?I5s57d5d5i5l5d5m5...
Click here to learn more about us: http://scclick.internetseer.com/sitecheck/clickthrough.jsp?I5s57d5d5i5l5d5m5...
Sincerely,
Connie Davis InternetSeer.com http://www.internetseer.com
------------------------------------------------------------------------------ Your email address was found during a prior visit to your website on 09-29-2003. The error listed above was verified from both of our indexing servers in Philadelphia, Pa. and Los Angeles, Ca. This error could have been caused by any number of events, including connectivity problems on our part and/or connectivity problems in the Internet as we tried to reach your site. This error should not be construed as a guaranteed problem on the part of your website or hosting company since there are never any guaranteed connection routes on the Internet.
If would like to be excluded from any potential future contact, click here: http://scclick.internetseer.com/sitecheck/cancel.jsp?USQM7qi6MH5aI5sUSQLzVVS...
##wikien-l@wikipedia.org## SRC=58
If one of the list admins sent this through; please don't. It's *spam*.
They crawl the web for e-mail addresses and then send these things out as advertisements for their link-checking / availability service. Which, of course, means they're crawling our web site and loading it down. *Thanks* so much.
The HTML mail also contains a "web bug": a 1x1 pixel transparent GIF image which contains a unique ID number and additionally sets a cookie when you view it, which happens automatically on many popular e-mail clients.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
If one of the list admins sent this through; please don't. It's *spam*.
They crawl the web for e-mail addresses and then send these things out as advertisements for their link-checking / availability service. Which, of course, means they're crawling our web site and loading it down. *Thanks* so much.
The HTML mail also contains a "web bug": a 1x1 pixel transparent GIF image which contains a unique ID number and additionally sets a cookie when you view it, which happens automatically on many popular e-mail clients.
This is one of those points where I give a rant about HTML email being pure evil. Thank whatever that OE finally got a 'View all email as plain text' option a while back. Considered dropping web traffic from them? Cutting off a misbehaving spider would be good... and this reminds me, let's make certain that the 'submit' button never, ever becomes a link. Else eveyr bad spider will wreak havoc. -- Jake
On Mon, 29 Sep 2003 11:07:26 -0700, Brion Vibber brion@pobox.com gave utterance to the following:
If one of the list admins sent this through; please don't. It's *spam*.
They crawl the web for e-mail addresses and then send these things out as advertisements for their link-checking / availability service. Which, of course, means they're crawling our web site and loading it down. *Thanks* so much.
I wonder if their crawler respects robots.txt? How hard would it be to id the crawler from server logs? Does W keep server logs?
At 09:22 AM 9/30/03 +1200, Richard Grevers wrote:
On Mon, 29 Sep 2003 11:07:26 -0700, Brion Vibber brion@pobox.com gave utterance to the following:
If one of the list admins sent this through; please don't. It's *spam*.
They crawl the web for e-mail addresses and then send these things out as advertisements for their link-checking / availability service. Which, of course, means they're crawling our web site and loading it down. *Thanks* so much.
I wonder if their crawler respects robots.txt?
I would assume not--robots.txt is a protocol for honest people, not spammers and liars.
On Mon, 29 Sep 2003, Vicki Rosenzweig wrote:
At 09:22 AM 9/30/03 +1200, Richard Grevers wrote:
I wonder if their crawler respects robots.txt?
I would assume not--robots.txt is a protocol for honest people, not spammers and liars.
They claim it does. They did hit robots.txt, and their accesses today seem to be within the general settings (no access to /w/wiki.phtml?whatever). I've changed it to now disallow any access by them on their next visit, we'll see what happens.
-- brion vibber (brion @ pobox.com)