Re: [Mediawiki-l] robots.txt

2 Oct 2006

Kasimir Gabert wrote:
...
  Hello,

 Excluding index.php using robots.txt should work if an article link
 on your page is http://mysite.tld/My_Page.  The robots would then not
 crawl http://mysite.tld/index.php?title=My_Page&action=editedit, etc. 
Kasimir, I believe you have written above a beautiful solution for my 
need. My article links on my site (http://wikigogy.org) are indeed done 
without reference to index.php but the 'edit', 'history' and other 
action pages that I wish to exclude are done with that reference. I had 
not realized this simple elegant solution. I will try it. It should 
look like this in my-wiki/robots.txt, right?:

User-agent: *
Disallow: index.php*

Is the asterisk on index.php* correct and needed?

Thank you,
Roger

...
  On 10/1/06, Sy Ali &lt;sy1234(a)gmail.com&gt; wrote:
 > On 9/25/06, Roger Chrisman &lt;roger(a)rogerchrisman.com&gt; wrote:
 > > But in the interest of short URLs, I serve my MediaWiki directly
 > > from site / without any /wiki/ or /w/ directories. So above
 > > meathod would not work on my installation.
 > >
 > > Any ideas how I can exclude robots from crawling all my wiki's
 > > edit, history, talk, etc, pages *without* excluding its article
 > > pages?
 >
 > I do the same thing, and I never did figure out the rules to
 > disallow the other sub-pages.
 >
 > As I understand, there are "nofol" tags within the web pages
 > itself, but I'm not certain that's being honoured.
 > _______________________________________________
 > MediaWiki-l mailing list
 > MediaWiki-l(a)Wikimedia.org
 > http://mail.wikipedia.org/mailman/listinfo/mediawiki-l 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [Mediawiki-l] robots.txt