[Mediawiki-l] robots.txt

Roger Chrisman roger at rogerchrisman.com
Mon Oct 2 06:43:59 UTC 2006


Roger Chrisman wrote:
> Kasimir Gabert wrote:
> > Hello,
> >
> > Excluding index.php using robots.txt should work if an article link
> > on your page is http://mysite.tld/My_Page.  The robots would then
> > not crawl http://mysite.tld/index.php?title=My_Page&action=edit,
> > etc.
>
> Kasimir, I believe you have written above a beautiful solution for my
> need. My article links on my site (http://wikigogy.org) are indeed
> done without reference to index.php but the 'edit', 'history' and
> other action pages that I wish to exclude are done with that
> reference. I had not realized this simple elegant solution. I will
> try it. It should look like this in my-wiki/robots.txt, right?:
>
> User-agent: *
> Disallow: index.php*
>
> Is the asterisk on index.php* correct and needed?

I think I should NOT have the asterisk in the URL prefix. I think 
asterisk is only for the User-agent line, meaning all robots. I think 
it should look like this in my-site/robots.txt:


User-agent: *
Disallow: index.php


and it will disallow robots from everything that is, or starts 
with, "index.php", which all the action page URLs do start with on my 
site but not article names because I am using pretty urls.

I read up on robots.txt here:
* http://www.robotstxt.org/wc/norobots.html#format
* http://www.robotstxt.org/wc/exclusion-admin.html

Thanks :-)
Roger



More information about the MediaWiki-l mailing list