-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tuesday 01 May 2007 08:07:13 Rob Church wrote:
On 01/05/07, Michael Daly mikedaly@magma.ca wrote:
What about the use of "noindex" to prevent the indexing of old versions of pages? I read about this on (a somewhat out-of-date) chongqed.org
We do this, as far as I'm aware.
Apparently, they can use the "recent changes" pages too. I find that the search engines are frequently accessing recent changes, but I'm not sure how to stop that.
Special pages should all be emitting appropriate <meta> tags with "noindex,nofollow" set, so search engines *oughtn't* to be indexing or following links from these.
Yeah, but they will still grabber them, causing traffic. And then there are the bots who don't obay robots.txt or "noindex, nofollow"...
I came up with this:
User-agent: BecomeBot User-agent: gonzo User-agent: NPBot User-agent: TMCrawler Disallow: / User-agent: googlebot Crawl-delay: 30 Disallow: /wiki/index.php?title=Special: Disallow: /wiki/index.php?title=Internal: Disallow: /wiki/index.php?title=MediaWiki: ... User-agent: * Crawl-delay: 120 Disallow: /wiki/ ...
forbidding MSN and Yahoo the wiki completely, as the three big search engines together caused about 90% of the traffic to my small wiki, going through every old page revision (from special recentchanged) etc.
If you have a smaller wiki, teergrubing certain user-agents (like "Java", "larbot", "-" etc.) might also make extreme sense. See http://bloodgate.com/drowns/example for the effect this has :)
All the best,
Tels
- -- Signed on Tue May 1 10:23:37 2007 with key 0x93B84C15. Get one of my photo posters: http://bloodgate.com/posters PGP key on http://bloodgate.com/tels.asc or per email.
"A witty saying proves nothing."
-- Voltaire