if a spider goes to Recent Changes and then to "Last 5000 changes" (and last 90 days, and last 30 days, and last 2500 changes, and last 1000 changes, and every such combination) it seems to me the server load could get pretty high. Perhaps talk pages should be spidered, but not recent changes or the history (diff/changes).
I agree. Every RecentChanges page contains links to 13 other RecentChanges, and one of them changes its URL each time the page is loaded. The other special: pages like statistics, all pages, most wanted etc. seem to be good candidates for robot exclusion as well: they stress the database but don't provide much useful information for indices.
Regarding talk:, wikipedia: and user: pages, I don't see any reason not to have them indexed.
Diff pages seem to be useless to spiders since the same information is contained in the two article versions.
Remaining question is: what about article histories and old versions of articles? Do we want Google to have a copy of every version of every article, or only the current one?
Axel