Combating spam

List overview All Threads
Download

newer

older

Resizes image

User authentication via sasl?

Rob Church

1 May 2007 1 May '07

12:23 a.m.

I've just completed the initial version of a "Combating spam" page on MediaWiki.org, which can be found at http://www.mediawiki.org/wiki/Manual:Combating_spam. I'm hoping that with the right incoming links, it will become an introduction to dealing with the issue for new wiki operators, since anti-spam measures are one of our more commonly asked questions.

I'm throwing the URL out here both to gather feedback, and to make it known to the user base at large.

Rob Church

Show replies by date

Michael Daly

1 May 1 May

6:57 a.m.

New subject: [Mediawiki-l] Combating spam

Rob Church wrote:

...

I'm throwing the URL out here both to gather feedback, and to make it known to the user base at large.

What about the use of "noindex" to prevent the indexing of old versions of pages? I read about this on (a somewhat out-of-date) chongqed.org page, but I don't think I understand it fully. There is also a reference to using robots.txt to prevent this, but that's even less clear (links on the page appear to be broken).

If we zap a page that's been spammed and the search engines still find it, the spammers can find your wiki as one with old spam and will re-spam.

Apparently, they can use the "recent changes" pages too. I find that the search engines are frequently accessing recent changes, but I'm not sure how to stop that.

Mike

Rob Church

8:07 a.m.

New subject: [Mediawiki-l] Combating spam

On 01/05/07, Michael Daly mikedaly@magma.ca wrote:

...

What about the use of "noindex" to prevent the indexing of old versions of pages? I read about this on (a somewhat out-of-date) chongqed.org

We do this, as far as I'm aware.

...

Apparently, they can use the "recent changes" pages too. I find that the search engines are frequently accessing recent changes, but I'm not sure how to stop that.

Special pages should all be emitting appropriate <meta> tags with "noindex,nofollow" set, so search engines *oughtn't* to be indexing or following links from these.

Rob Church

Tels

10:29 a.m.

New subject: [Mediawiki-l] Combating spam

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Tuesday 01 May 2007 08:07:13 Rob Church wrote:

...

On 01/05/07, Michael Daly mikedaly@magma.ca wrote:

...
What about the use of "noindex" to prevent the indexing of old versions of pages? I read about this on (a somewhat out-of-date) chongqed.org

We do this, as far as I'm aware.

...
Apparently, they can use the "recent changes" pages too. I find that the search engines are frequently accessing recent changes, but I'm not sure how to stop that.

Special pages should all be emitting appropriate <meta> tags with "noindex,nofollow" set, so search engines *oughtn't* to be indexing or following links from these.

Yeah, but they will still grabber them, causing traffic. And then there are the bots who don't obay robots.txt or "noindex, nofollow"...

I came up with this:

User-agent: BecomeBot User-agent: gonzo User-agent: NPBot User-agent: TMCrawler Disallow: / User-agent: googlebot Crawl-delay: 30 Disallow: /wiki/index.php?title=Special: Disallow: /wiki/index.php?title=Internal: Disallow: /wiki/index.php?title=MediaWiki: ... User-agent: * Crawl-delay: 120 Disallow: /wiki/ ...

forbidding MSN and Yahoo the wiki completely, as the three big search engines together caused about 90% of the traffic to my small wiki, going through every old page revision (from special recentchanged) etc.

If you have a smaller wiki, teergrubing certain user-agents (like "Java", "larbot", "-" etc.) might also make extreme sense. See http://bloodgate.com/drowns/example for the effect this has :)

All the best,

Tels

- -- Signed on Tue May 1 10:23:37 2007 with key 0x93B84C15. Get one of my photo posters: http://bloodgate.com/posters PGP key on http://bloodgate.com/tels.asc or per email.

"A witty saying proves nothing."

-- Voltaire

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iQEVAwUBRjcWencLPEOTuEwVAQI1XAf/QJK2d9/VokTq3v46uyUFrV7CtitnuzpF vtDOukMDLlCVEMnUYZ0uRK7UENqQhdgpNEeBGBx+TtTXkd2e6IbEWB96lvRR+M2b KXiujaGap8t851Ash4idF/gt49eSk/hbj1d8757YBL8/10GF2JGlLOfokIraipDQ Jlx+KJZURF+U0bgNJo7nSPpQLOBsAW35DNRvTYNxsHH79Whh/6Scn6X009yRlqhC OxdIBRgW+9Y6wcIhdqCkkQ0SQ4G937qZzXfc92G/MFh3ezCR8+Yeuj8aOb1xQZ7X vtYC7tHDwn4YBZ59hNWPvPTXIopkkoM/coAIw99GcJzufNSJ24KZYg== =XQiD -----END PGP SIGNATURE-----

6286

Age (days ago)

6286

Last active (days ago)

mediawiki-l@lists.wikimedia.org

3 comments

3 participants

tags (0)

participants (3)

Michael Daly
Rob Church
Tels