if a spider goes to Recent Changes and then to "Last 5000 changes" (and last 90 days, and last 30 days, and last 2500 changes, and last 1000 changes, and every such combination) it seems to me the server load could get pretty high. Perhaps talk pages should be spidered, but not recent changes or the history (diff/changes).
I agree. Every RecentChanges page contains links to 13 other RecentChanges, and one of them changes its URL each time the page is loaded. The other special: pages like statistics, all pages, most wanted etc. seem to be good candidates for robot exclusion as well: they stress the database but don't provide much useful information for indices.
Regarding talk:, wikipedia: and user: pages, I don't see any reason not to have them indexed.
Diff pages seem to be useless to spiders since the same information is contained in the two article versions.
Remaining question is: what about article histories and old versions of articles? Do we want Google to have a copy of every version of every article, or only the current one?
Axel
At 08:42 PM 5/18/02 +0200, Axel Boldt wrote:
if a spider goes to Recent Changes and then to "Last 5000 changes" (and last 90 days, and last 30 days, and last 2500 changes, and last 1000 changes, and every such combination) it seems to me the server load could get pretty high. Perhaps talk pages should be spidered, but not recent changes or the history (diff/changes).
I agree. Every RecentChanges page contains links to 13 other RecentChanges, and one of them changes its URL each time the page is loaded. The other special: pages like statistics, all pages, most wanted etc. seem to be good candidates for robot exclusion as well: they stress the database but don't provide much useful information for indices.
Actually, wouldn't "All pages" be a very _good_ page to allow spiders to read? It would let them cut straight to the heart of the matter and get a list of all the pages they need from Wikipedia. At the very least, they should be allowed to read the orphans list, since the pages listed there won't be found by spidering through the conventional pages.
-- "Let there be light." - Last words of Bomb #20, "Dark Star"
wikipedia-l@lists.wikimedia.org