On Thu, Dec 4, 2014 at 8:08 PM, MZMcBride z@mzmcbride.com wrote: <snip>
Robert, let me know if you want access on mediawiki.org to look at the deleted edits, though they're quite boring.
It wouldn't hurt to take a look. Though I suspect getting feedback from people who look at these things often would be more useful, at least until some sort of a large-scale study could be mounted.
I suspect that a lot of the spam are the obvious things such as external links to junk sites and repetitive promotional postings, though perhaps there are also less obvious types of spam?
I suspect we could weed out a lot of spammy link behavior by designing an external link classifier that used knowledge of what external links are frequently included and what external links are frequently removed to generate automatic good / suspect / bad ratings for new external links (or domains). Good links (e.g. NYTimes, CNN) might be automatically allowed for all users, suspect links (e.g. unknown or rarely used domains) might be automatically allowed for established users and challenged with captchas or other tools for new users / IPs, and bad links (i.e. those repeatedly spammed and removed) could be automatically detected and blocked.
-Robert Rohde