Try #2: Idea for learning spam filter - Wikitech-l

22 Jan 2006

I posted previously about this and got no responses, but I think it's a
really good idea, so I'm trying one more time.  I've been thinking about a
filter that adds urls to a blacklist after X number of repeat posts of that
url within Y amount of time.

The idea is that the normal behaviour for a wiki is that it is very unlikely
for an url to be referenced in an edit more than X times in 24 hours (I'll
postulate twice, just for fun), but it is very common behaviour for a
spammer.  Therefore, if one could ascertain (do some tests) that some large
percentage (say 99.9%) of valid (non-spam) urls posted via edits occur less
than X in 24 hours, you could put in a filter inconveniencing a very small
percentage of users that would autmatically add those urls to a blacklist.

I think it's worth testing anyway.  What I'm proposing is to collect urls
posted via edit in a short period of time (say a week) and do an analysis.
Anybody else like this idea?  I'd be willing to write a script, but there'd
have to be a hook, or a small hack in editpage to call it.

Best Regards,
Aerik