[Wikipedia-l] robots and spiders

18 May 2002


      ...
if a spider goes to Recent Changes and then to "Last 5000 changes"
(and last 90 days, and last 30 days, and last 2500 changes, and last
1000 changes, and every such combination) it seems to me the server
load could get pretty high. Perhaps talk pages should be spidered,
but not recent changes or the history (diff/changes).
I agree. Every RecentChanges page contains links to 13 other
RecentChanges, and one of them changes its URL each time the page is
loaded. The other special: pages like statistics, all pages, most
wanted etc. seem to be good candidates for robot exclusion as well:
they stress the database but don't provide much useful information for
indices.
Regarding talk:, wikipedia: and user: pages, I don't see any reason not
to have them indexed.
Diff pages seem to be useless to spiders since the same information
is contained in the two article versions.
Remaining question is: what about article histories and old versions
of articles? Do we want Google to have a copy of every version of
every article, or only the current one?
Axel

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

[Wikipedia-l] robots and spiders