In light of the editor retention problem,
I suggest we have to be very careful with any kind of “plagiarism
detector” software because we have real subject matter experts among our
editors. I’m aware of members of local history societies who have had issues
with copyright violation because they have content on their own websites which
they then contribute to Wikipedia. It’s not a copyright violation because
it’s their own work, but it was deleted, they were accused of copyright
violation and they were naturally very unhappy about both. Being new users they
did not know any way to get this redressed, they asked me for help and I got
nowhere with the editor who deleted the material who would not accept their
assertion that they were the original authors (how on earth could they prove
it?). As a result, none of them are now active editors. Having had a whole
bunch of my own images nearly deleted from Commons because they appear on my
own website (despite my user name being my real name and my real name is all
over my website), I know how they feel about having accusations of copyright
violation all over your contributions – it’s really offensive. Strangely
we have no way to whitelist particular websites in relation to particular users
(in theory, you’d want to be able to whitelist books and off-line
resources too but in practice “copies” from these are far less
likely to be noticed), so the same problem can arise again and again for an
individual contributor.
So I would be very hesitant about putting
any visible tag on an article suggesting it was a copyright violation (as it
seems to me it is both offensive and potentially libellous to the editor who
has in good faith contributed their own work). I think any concern about
copyright has to be first raised with the editor involved as a question NOT an
accusation. And I note that it is often very difficult to communicate with
new/occasional editors as they often have no email address associated with
their account and they don’t see talk page message banners unless they
are remember-me logged-in. It’s ironic that at a time a contributor is
most likely to want/need help, we are in the worst position to know they want
it or offer it if we see they need it.
So, I’m with Jane on this one. It’s
easy enough to detect a lot of potential copyright violations automatically. What’s
hard and very much a manual task is confirming it really is a copyright violation
and, where required, educating the contributor. I think there’s a real
danger to automating the first part without a good solution to the second part.
We have far too many editors who use tools as weapons already, so I am
reluctant to give them more weapons.
Kerry