Roger Chrisman wrote:
Kasimir Gabert wrote:
Hello,
Excluding index.php using robots.txt should work if an article link on your page is http://mysite.tld/My_Page. The robots would then not crawl http://mysite.tld/index.php?title=My_Page&action=edit, etc.
Kasimir, I believe you have written above a beautiful solution for my need. My article links on my site (http://wikigogy.org) are indeed done without reference to index.php but the 'edit', 'history' and other action pages that I wish to exclude are done with that reference. I had not realized this simple elegant solution. I will try it. It should look like this in my-wiki/robots.txt, right?:
User-agent: * Disallow: index.php*
Is the asterisk on index.php* correct and needed?
I think I should NOT have the asterisk in the URL prefix. I think asterisk is only for the User-agent line, meaning all robots. I think it should look like this in my-site/robots.txt:
User-agent: * Disallow: index.php
and it will disallow robots from everything that is, or starts with, "index.php", which all the action page URLs do start with on my site but not article names because I am using pretty urls.
I read up on robots.txt here: * http://www.robotstxt.org/wc/norobots.html#format * http://www.robotstxt.org/wc/exclusion-admin.html
Thanks :-) Roger