[Mediawiki-l] RE: googlebot

Rowan Collins rowan.collins at gmail.com
Wed Aug 17 11:57:35 UTC 2005


On 17/08/05, Andres Obrero <andres at holzapfel.ch> wrote:
> Ok, robots.txt seem to be worth to use it.
> At the moment google search endless in
> /mediawiki/index.php?title=Spezial:Recentchanges&from=2005...
> What is the rule to avoid this type of request?
> It is not in my interest to disallow index.php,
> only ?title=Spezial:Recentchanges

Well, as Christof says, you could use the following to tell bots not
to look at RC at all:
User-Agent:*
Disallow: /wiki/Special:Recentchanges
Disallow: /wiki/Special%3ARecentchanges

Or you could take the same approach as Wikimedia, and only let spiders
access pages with no extra parameters - assuming you have a rewrite
rule to access "plain" pages as "/wiki/page" or somesuch; see
http://mail.wikipedia.org/pipermail/wikitech-l/2005-August/031032.html

The advantage being that you may actually want spiders to index your
recent changes, because it helps them spot what needs indexing.

-- 
Rowan Collins BSc
[IMSoP]



More information about the MediaWiki-l mailing list