Re: [Mediawiki-l] robots.txt

2 Oct 2006


      Roger Chrisman wrote:
...
Kasimir Gabert wrote:
...
Hello,
Excluding index.php using robots.txt should work if an article link
on your page is http://mysite.tld/My_Page.  The robots would then
not crawl http://mysite.tld/index.php?title=My_Page&action=edit,
etc.
Kasimir, I believe you have written above a beautiful solution for my
need. My article links on my site (http://wikigogy.org) are indeed
done without reference to index.php but the 'edit', 'history' and
other action pages that I wish to exclude are done with that
reference. I had not realized this simple elegant solution. I will
try it. It should look like this in my-wiki/robots.txt, right?:
User-agent: *
Disallow: index.php*
Is the asterisk on index.php* correct and needed?
I think I should NOT have the asterisk in the URL prefix. I think 
asterisk is only for the User-agent line, meaning all robots. I think 
it should look like this in my-site/robots.txt:
User-agent: *
Disallow: index.php
and it will disallow robots from everything that is, or starts 
with, "index.php", which all the action page URLs do start with on my 
site but not article names because I am using pretty urls.
I read up on robots.txt here:
* http://www.robotstxt.org/wc/norobots.html#format
* http://www.robotstxt.org/wc/exclusion-admin.html
Thanks :-)
Roger

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [Mediawiki-l] robots.txt