On Thu, 2003-03-20 at 22:52, Zoe wrote:
sannse sannse@delphiforums.com wrote: I just did a search on Google for "Okhrana" and came up with www.wikipedia.org/w/wiki.phtml?title=Okhrana&action=edit as the 10th hit. But that's a link to an edit page
I think the problem was that Google had cached our page, and I just deleted it. You therefore got sent to a nonexistent entity.
That wouldn't have gone to an edit page, just to a blank page. The problem here is that an actual edit URL got into google at some point and is still coming up in results.
Sannse, we *do* exclude edit pages from google's and other bots' spiders, doubly:
* robots.txt excludes access to the /w/ subdirectory, and thus all direct script actions (edits, histories, diffs, printable mode, changing options/length on recentchanges, etc), so it shouldn't be touching them at all.
* edit pages and such have meta tags telling robots "noindex,nofollow"; ie that if they do end up with that page, they shouldn't index it, and shouldn't follow links from it, but should just toss the page out and go back where it came from.
A few have somehow gotten through. I'm not sure how. They may be old and not yet flushed (googlebot is still going over the site and hasn't reindexed every page yet). Note that in the google results there's no summary extract, no cache, no notice of the size. It's just a raw URL sitting there in the results. That's weird and wrong, and to me indicates a problem in their index.
(and for some reason one without the "You've followed a link to a page that doesn't exist.." explanation.)
Hmm, I *do* see that message when I follow the link.
-- brion vibber (brion @ pobox.com)