On 12/30/05, Brion Vibber brion@pobox.com wrote: (snip)
To begin with, old versions are specifically marked for spiders not to index them. Deleted pages aren't accessible to an outside spider at all.
If your robots.txt is not set up properly to keep the robots from visiting the pages, you should also set that up though that's just to keep useless load from hitting the server.
So does this mean that I either /must/ or /don't need to/ tweak my own robots.txt to ensure that robots don't crawl history?
I had heard that there are the proper meta tag (or whatever) to tell spiders not to delve into revisions.. where can I learn more about this issue?
I visited the meta page, but it doesn't go into detail.. http://meta.wikimedia.org/wiki/Robots.txt