On 1/17/06, Andrew Gray shimgray@gmail.com wrote:
It's logistically quite tricky to arrange matters so the spiders understand the difference between a talk page and a "real" page; "allowing" isn't the key, it's "why don't we prevent it", and the answer is "if we tried it probably wouldn't work very well".
Not to stop anyone attempting something, but...
Wouldn't adding
Disallow: /wiki/Wikipedia:Articles_for_deletion
to http://meta.wikimedia.org/robots.txt do the trick? I assume the search engines will treat subpages as directories, as they are separated by slashes.
Can robots.txt use wildcards? If it can, we could quite easily restrict caching of the entire Wikipedia namespace, if we wanted (and I doubt we would), using:
Disallow: /wiki/Wikipedia:*
-- Sam