On 17/08/05, Andres Obrero andres@holzapfel.ch wrote:
Ok, robots.txt seem to be worth to use it. At the moment google search endless in /mediawiki/index.php?title=Spezial:Recentchanges&from=2005... What is the rule to avoid this type of request? It is not in my interest to disallow index.php, only ?title=Spezial:Recentchanges
Well, as Christof says, you could use the following to tell bots not to look at RC at all: User-Agent:* Disallow: /wiki/Special:Recentchanges Disallow: /wiki/Special%3ARecentchanges
Or you could take the same approach as Wikimedia, and only let spiders access pages with no extra parameters - assuming you have a rewrite rule to access "plain" pages as "/wiki/page" or somesuch; see http://mail.wikipedia.org/pipermail/wikitech-l/2005-August/031032.html
The advantage being that you may actually want spiders to index your recent changes, because it helps them spot what needs indexing.