On 8/18/07, jidanni(a)jidanni.org <jidanni(a)jidanni.org> wrote:
OK, now in
http://radioscanningtw.jidanni.org/robots.txt I'm trying
the common extended protocol "Disallow: /*&"
It seems only fully-specified prefixes with no wildcards are permitted
in robots.txt:
"The value of this field specifies a partial URL that is not to be
visited. This can be a full path, or a partial path; any URL that
starts with this value will not be retrieved. For example, Disallow:
/help disallows both /help.html and /help/index.html, whereas
Disallow: /help/ would disallow /help/index.html but allow
/help.html."
<http://www.robotstxt.org/wc/norobots.html>
So there's no way to do any of this without prettified URLs, I don't
think, short of listing every possible page. Or you could hack up the
code to make URLs look like ?action=edit&title=Foo instead of the
reverse. In light of this, it does seem like it would be a good idea
to use rel="nofollow" on links to things like edit pages. Someone
just needs to code it.
but I see even the
standard "Disallow: /index.php?title=Special:" is ignored here:
Maybe you added extra line breaks? A blank line terminates a section,
and any section not starting with a User-Agent line will be ignored.