Brion Vibber <brion@...> writes:
Sounds great to me.
See extensions/ConfirmEdit in CVS for my half-done captcha system; it's got a
little thingy for extracting added URLs from an edit.
Brion, thanks! - I checked our your code, and I believe it's takes care of
all
the functionality except posting the urls and a timestamp into something. For
the sake of simplicity, I'm think of just appending it to a text file - sound
okay to you? And I'd really love to test it on a large wiki to get a nice BIG
dataset rather quickly - like on EN... is this the right place to get something
like that approved?
And Michael Keppler <Michael.Keppler@...> writes:
Александр Сигачёв schrieb:
Image edit war about link. Or text of article was
deleted by vandal,
and someone want to revert change, but in text were links.
To avoid that you could store the pairs of (article, embedded url) from
changed articles with URLs, not only the plain URLs.
With the list of URLs you can then avoid the mass spamming of a single
URL into thousands of articles (as described before), but with the list
of pairs from changed articles with their URLs you can then override
this decision and allow adding/deleting an URL from an article, where
that same URL has been added/deleted/edited in the same article during
the last 24 hours (or whatever time span is used for the black list
algorithm).
Beautiful suggestion Michael, and that would allow for a finer analysis of
behaviour too. So we'd collect three data points, the url, the wiki page being
edited, and the datetime. After a short while, say a week, I'd compile the data
into a histogram (might have to do one histogram per time period, haven't
finished thinking about this) showing the frequency of urls appearing in
*different* articles in a time period, and then start seeing how many of the
urls that appear frequently were spam (easy enough - look at the page itself).
Best Regards,
Aerik