I am currently reviewing work on spam detection on Wikipedia. West et al.
(2011) https://dl.acm.org/doi/pdf/10.1145/2038558.2038574 found that *the length (in characters) of the revision summary* was one of the features with the greatest weight in the final classifier. Oh yeah, adding some quantitative evidence to what Jonathan pointed out about blank edit summaries being a useful signal: "Some 88% of spam leaves [the edit summary] blank..." which they indicate is in comparison to only 17% of external link additions by trusted users being without an edit summary.
On Thu, Aug 5, 2021 at 5:18 AM Pablo Aragón paragon@wikimedia.org wrote:
Hi Isaac,
I am currently reviewing work on spam detection on Wikipedia. West et al. (2011) https://dl.acm.org/doi/pdf/10.1145/2038558.2038574 found that *the length (in characters) of the revision summary* was one of the features with the greatest weight in the final classifier.
Best,
On Wed, Aug 4, 2021 at 11:46 PM Isaac Johnson isaac@wikimedia.org wrote:
Thanks all for the feedback! If anyone thinks of more, by all means send over.
- One of the reasons why any suggestion that we make edit summaries
compulsory is that as long as they are optional, blank edit summaries
are a
great way to identify vandals. This is a pretty interesting point. For further context, I'm asking
because
I'm mentoring a researcher who will be looking into edit summary usage
and
I wanted to make sure we weren't asking questions that had already been answered elsewhere. The research is still in the formative stages of figuring out what additional research might be useful and just having a better understanding of the distribution of edit types. When I think of tools / interventions based on what little I know, however, it's mainly along the lines of what sorts of edit tags (or similar filters) could be auto-generated to further contextualize edit summaries. Helping editors quickly match their edit to templated/canned messages is an idea that
gets
floated around too but could be counterproductive for the vandalism case
as
you point out.
There is a long-standing tool to search them at
https://sigma.toolforge.org/summary.py?name=Stuartyeates&search=re-revie...
In case you're looking for code to reuse. Thanks! Glad to see this tool exists!
For completeness, it was also pointed out to me that Wattenberg, Viégas, and Hollenbach's 2007 paper "Visualizing Activity on Wikipedia with Chromograms" makes heavy use of edit summaries and provides some insight into their usage: https://link.springer.com/content/pdf/10.1007/978-3-540-74800-7_23.pdf
Best, Isaac
On Tue, Aug 3, 2021 at 3:48 PM Stuart A. Yeates syeates@gmail.com
wrote:
There is a long-standing tool to search them at
https://sigma.toolforge.org/summary.py?name=Stuartyeates&search=re-revie...
In case you're looking for code to reuse.
cheers stuart -- ...let us be heard from red core to black sky
On Wed, 4 Aug 2021 at 05:38, WereSpielChequers werespielchequers@gmail.com wrote:
Dear Isaac,
I'm not aware of any research on this. But there are a couple of
common
assumptions that you could check as part of any research.
- One of the reasons why any suggestion that we make edit
summaries
compulsory is that as long as they are optional, blank edit
summaries
are a
great way to identify vandals. 2. There is also a certain amount of "sneaky vandalism" denoted by
edits
that get reverted or reverted and the perpetrators get warned for
vandalism
or blocked as a "vandalism only account" 3. Though we admins have the technology to blank people's edit
summaries
it is very rarely used
Regards Jonathan
On Tue, 3 Aug 2021 at 16:20, Isaac Johnson isaac@wikimedia.org
wrote:
Does anyone know of any research or statistics around edit summary https://en.wikipedia.org/wiki/Help:Edit_summary usage on
Wikipedia?
All
I could find in a quick scan was some statistics from 2010 ( https://meta.wikimedia.org/wiki/Usage_of_edit_summary_on_Wikipedia
).
I'm
curious if anyone has more updated statistics, or, even better: a
more
thorough analysis of how edit summaries are used by editors -- i.e.
how
complete they are, to what degree they represent the "what" vs. the
"why",
how often they are misleading, etc.
Best, Isaac
-- Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia
Foundation
Wiki-research-l mailing list --
wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to
wiki-research-l-leave@lists.wikimedia.org
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to
wiki-research-l-leave@lists.wikimedia.org _______________________________________________ Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to
wiki-research-l-leave@lists.wikimedia.org
-- Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation _______________________________________________ Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to
wiki-research-l-leave@lists.wikimedia.org
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to wiki-research-l-leave@lists.wikimedia.org