So, I just installed the CRM114 Markovian spam filtering software:
http://crm114.sourceforge.net/
The whole thing is based on Bayesian filtering, which is just a way to make very dumb software make really smart decisions. With sufficient training, a very simple piece of software can make very accurate distinctions between spam and non-spam email messages. See Paul Graham's famous "A Plan for Spam" about this:
http://www.paulgraham.com/spam.html
The CRM114 stuff is Markovian, which means it's _even_dumber_ than Bayesian stuff, and makes _even_smarter_ decisions. More or less.
Anyways, one thing that's mentioned on the crm114 page is that folks use the same technology for different kinds of text sorting. Like, for system administrators, they can sort log file entries into ones they're interested in and ones they're not.
And I was thinking: you know, it'd be nice to be able to flag acceptable and problematic articles in MediaWiki Web sites. Like, say, an admin sees some vandalism going on, and goes to fix it. One of the checkmarks on saving is "Vandalism fix" or some such. This would tag the previous version as... ungood. Something.
And then after a while the software gets good at understanding what's ungood and what's not. And there could be a tracking page to say, "These seem to be pages in an ungood state." And it would be easier to find those and fix 'em.
~ESP
wikitech-l@lists.wikimedia.org