On 8/18/07, jidanni@jidanni.org jidanni@jidanni.org wrote:
OK, now in http://radioscanningtw.jidanni.org/robots.txt I'm trying the common extended protocol "Disallow: /*&"
It seems only fully-specified prefixes with no wildcards are permitted in robots.txt:
"The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html." http://www.robotstxt.org/wc/norobots.html
So there's no way to do any of this without prettified URLs, I don't think, short of listing every possible page. Or you could hack up the code to make URLs look like ?action=edit&title=Foo instead of the reverse. In light of this, it does seem like it would be a good idea to use rel="nofollow" on links to things like edit pages. Someone just needs to code it.
but I see even the standard "Disallow: /index.php?title=Special:" is ignored here:
Maybe you added extra line breaks? A blank line terminates a section, and any section not starting with a User-Agent line will be ignored.