[Wikimedia-l] Copyright infringement - The real elephant in the room
Matthew Flaschen
mflaschen at wikimedia.org
Tue Nov 19 01:07:48 UTC 2013
On 11/16/2013 09:04 AM, Anthony Cole wrote:
> The problem of false positives from mirrors doesn't exist if we scan edits
> as they are made.
Agreed. However, that example is a legal, attributed (at least on the
talk page) copy from a third-party freely licensed text, not a false
positive copy from a Wikipedia mirror.
> Maggie says here<https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard#Emergency_block_of_an_editor_with_which_I_have_been_previously_involved>that
> copyright bots populate
> WP:SCV <https://en.wikipedia.org/wiki/Wikipedia:SCV> So a
> similarly-configured bot could scan recent changes and tag suspected
> copyvios in watchlists and page histories like suspected vandalism is
> currently tagged.
The suspected vandalism checks that actually tag the edit (e.g. "Tag:
possible vandalism") are based on AbuseFilter checks. These are
relatively fast determinations that consider the text of the edit (e.g.
regexes for strings of curse words, or meaningless repeating
characters), and comparisons to the previous version (blanked the
section, blanked the page).
As far as I know, regular AbuseFilter rules can not hit a database or
web search to check for copyright violations. An extension could in
theory do this. But there would possibly be performance problems, since
AbuseFilter runs on the actual server (not just some bot's computer) on
every edit.
It is possible for a bot to scan every edit; it just can't use
AbuseFilter tags.
Matt Flaschen
More information about the Wikimedia-l
mailing list