Tim Starling wrote:
Gerrit wrote:
The bot I am thinking of would follow Newpages live. It fetches each page, and checks it against it database. If it's classified as ham, then continue. If it's classified as unsure, ask the user whether it is {{delete}}-material: if yes, train as spam and prepend {{delete}} to the article. If no, train as ham. It could add a comment to the article or a message to the talk page: <!-- classified by ... as ... with score ... -->
I was thinking about Bayesian filters too, but as a means of marking changes in RC for users to deal with them, not as a bot. As for marking diffs, well, my idea would be to mark reductions in article size as such, and to run the filter only on added text.
Yes. I think I could find time to write a prototype next weekend. So far, I see two ways to track changes live:
- listen to #enrc.wikipedia (not sure about bot policy...) - listen to Zwitter's rcdumper (should be possible)
It would be easier if there was a server:port socket with an rcdumper writing the rc's of all wiki's to the port, where bots like this one can listed. It's not really necesarry, though.
yours, Gerrit.
P.S. I tested whether my own e-mail bayesian database could serve as a start, but it could certainly not: [[World War II]] was classified as 99% spam because of words like oil and finance ;-).