Actually, I am afraid, for CCI at some point we will have to remove all added text by bot. I do not see any other scalable solution.
Cheers Yaroslav
On Mon, Jun 17, 2019 at 5:36 PM Stephen Philbrick < stephen.w.philbrick@gmail.com> wrote:
I have seen a couple comments on copyright issues in the last couple days so I thought I'd share some information that I think may be not well-known by everyone.
Very roughly, copyright issues (text) can be viewed in three categories:
- Addition of copyrighted material to articles in years past, not yet
removed (one-off) 2. Same as above, except by a serial violator 3. Close to real-time edits which may include copyrighted material
The reason for distinguishing these three categories is that our approach and success rates are very different.
In case 1, an editor identifies what they believe to be a copyright issue in an existing article. They can report it to Wikipedia:Copyright_problems. In the case of a single issue or a very small handful of issues, those items are identified and taken care of by volunteers. (I think this aspect is handled adequately — I used to be active there but haven't been recently)
The second case arises when a potential violation is identified. An examination of the editors contributions reveals many examples (typically five or more). If this occurs, it is referred to Wikipedia:Contributor copyright investigations. A CCI is opened, and the intent is to examine every single edit by that editor. This aspect is extremely backlogged. I've spent many hours working on CCI's, but it isn't easy, it isn't rewarding, and it is discouraging because I think the backlog is increasing rather than decreasing. (This isn't due to newly created copyright issues but newly found ones.)
The third case is handled by Copy Patrol, a foundation created tool that examines all new edits in close to real time and generates a report, which is handled by volunteers.
I want to emphasize this third aspect for multiple reasons. I think it is one of the least known tools. Some of the prior emails on the subject leave the impression that the authors are unaware of the existence of this tool. On the one hand, it works very well, as almost all of the several hundred reports each week are reviewed, most within 24 hours.
Good news:
- Copy Patrol is working, so my guess is that the growth in true copyright
issues is close to nonexistent.
Bad news:
- Copy Patrol is adequately staffed but just barely. One editor is
responsible for the handling of far more than half of all of these reports (major kudos to Diannaa), but that much reliance on a single volunteer is not good for the long-term health of the project.
- The copy patrol tool is pretty good, and was being improved for a while,
but I've identified some desirable improvements and my sense is that it's a very back burner project in terms of additional enhancements.
- CCI clearance is going to take many years
Phil (Sphilbrick) _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe