Hi,
On Tue, 8 Feb 2005, Rhobite wrote:
On Tue, 08 Feb 2005 13:34:48 -0800, Brion Vibber brion@pobox.com wrote:
But I don't think CAPTCHA tests are the right approach, due to accessibility issues.
What would you suggest?
Unfortunately I don't have any great suggestions. I've dealt with bot spam on a much smaller scale on my weblog, and it's not a simple problem.
How about a lazy Bayesian similarity checker: spam bots tend to write the same blabla into several articles. So, after each edit, with a certain (low) probability, check for identical (or similar) words in the last 100 articles, and flag those articles with matches (or matched words) for potential spam which can then be blocked more efficiently. Of course, there are words like "is", "are", "the", etc. are probably there, but there are relatively few words which are common (I think something like 1500 words in English). The list of common words could be built on the fly.
Just my 1.5 cents, Dscho