Dear Listserv,
Hope all is well. I am mapping IP address edits per country for 271 language Wikipedias. I would like to exclude IP addresses that are vandalism. I was thinking of using the ipblocks table for the IP addresses to be excluded. Because this project is in so many different languages and my programming skills are intermediate, I would like to use the Wikipedia tables or registers that the Wikipedians in those language use to mark vandalism. If anyone has another idea, I would be most grateful. Perhaps I am missing a way that Wikipedians across languages are using to mark vandalism.
Thank you, Tom
I’m not quite sure what you want. An IP address may be used by one or many anonymous contributors (workplaces, universities and schools can often appear to Wikipedia as a single IP address). Each of those contributors may make one or more edits. Each of those edits may be vandalism (a deliberate intention to damage and hopefully reverted), poor quality but good faith edits (which are reverted for a wide variety of reasons) or acceptable contributions.
Also there is a reluctance to block a known multi-user IP address because of misbehaviour by what appears to be one person.
So, when you say “IP addresses that are vandalism”, can you more specific about what you want or don’t want?
Kerry
Sent from my iPad
On 16 Jan 2019, at 9:03 pm, Thomas Stieve tomthirteen@email.arizona.edu wrote:
Dear Listserv,
Hope all is well. I am mapping IP address edits per country for 271 language Wikipedias. I would like to exclude IP addresses that are vandalism. I was thinking of using the ipblocks table for the IP addresses to be excluded. Because this project is in so many different languages and my programming skills are intermediate, I would like to use the Wikipedia tables or registers that the Wikipedians in those language use to mark vandalism. If anyone has another idea, I would be most grateful. Perhaps I am missing a way that Wikipedians across languages are using to mark vandalism.
Thank you, Tom
-- Thomas Stieve Ph.D. Candidate School of Geography and Development University of Arizona _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
And, FWIW, I don’t think we have a flag on an edit saying that is vandalism. We have a history that can show an edit that is reverted. On inspection of the edit summary of the reversion, there may be some textual clues e.g. “rvv” a common abbreviation for “reverting vandalism”. There may be a message in the reverted IP’s talk page that uses words that suggest vandalism (noting that many of these messages are templates and so have highly predictable structure, usually with initially neutral terms like “not constructive” escalating to the explicit use of the word “vandalism” in some form). However, these messages may not specifically link to the problematic edit so you would be looking for talk page messages appearing “shortly” after the revert of the edit.
Not all vandalism is immediately detected; there may be a number of other edits intervening, which may make it impossible to revert.
Not all vandalism is removed with revert, it may occur by “normal editing” perhaps as part of a larger edit.
Not all reverted edits are vandalism. They may be well-intentioned but breach a Wikipedia policy (eg requirement for citation, present an opinion as a fact). Some acceptable edits get reverted for a range of (mostly unacceptable) reasons like gatekeeping, style errors, UI errors (if the GUI loads slowly, my click to say thanks sometimes turns into a revert!), etc.
And finally, as someone who does her watch list diligently, sometimes you just can’t tell if an edit is vandalism. The classic is the small change in dates. If there is no citation or the citation is to a off-line resource or a deadlink, it may be impossible to tell if the changed information is a genuine correction or a deliberately damaging action. Obviously I may have my suspicions, but I do have the obligation to Assume Good Faith. It’s not easy.
Kerry
Sent from my iPad
On 16 Jan 2019, at 9:03 pm, Thomas Stieve tomthirteen@email.arizona.edu wrote:
Dear Listserv,
Hope all is well. I am mapping IP address edits per country for 271 language Wikipedias. I would like to exclude IP addresses that are vandalism. I was thinking of using the ipblocks table for the IP addresses to be excluded. Because this project is in so many different languages and my programming skills are intermediate, I would like to use the Wikipedia tables or registers that the Wikipedians in those language use to mark vandalism. If anyone has another idea, I would be most grateful. Perhaps I am missing a way that Wikipedians across languages are using to mark vandalism.
Thank you, Tom
-- Thomas Stieve Ph.D. Candidate School of Geography and Development University of Arizona _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Tom,
You may be interested in the ORES Platform https://www.mediawiki.org/wiki/ORES, which provides a vandalism detection service across many (but not all) Wikipedia languages. It works at the revision level, not the user level, but I suppose you could filter and/or aggregate.
Best, Jonathan
On Wed, Jan 16, 2019 at 1:19 PM Kerry Raymond kerry.raymond@gmail.com wrote:
And, FWIW, I don’t think we have a flag on an edit saying that is vandalism. We have a history that can show an edit that is reverted. On inspection of the edit summary of the reversion, there may be some textual clues e.g. “rvv” a common abbreviation for “reverting vandalism”. There may be a message in the reverted IP’s talk page that uses words that suggest vandalism (noting that many of these messages are templates and so have highly predictable structure, usually with initially neutral terms like “not constructive” escalating to the explicit use of the word “vandalism” in some form). However, these messages may not specifically link to the problematic edit so you would be looking for talk page messages appearing “shortly” after the revert of the edit.
Not all vandalism is immediately detected; there may be a number of other edits intervening, which may make it impossible to revert.
Not all vandalism is removed with revert, it may occur by “normal editing” perhaps as part of a larger edit.
Not all reverted edits are vandalism. They may be well-intentioned but breach a Wikipedia policy (eg requirement for citation, present an opinion as a fact). Some acceptable edits get reverted for a range of (mostly unacceptable) reasons like gatekeeping, style errors, UI errors (if the GUI loads slowly, my click to say thanks sometimes turns into a revert!), etc.
And finally, as someone who does her watch list diligently, sometimes you just can’t tell if an edit is vandalism. The classic is the small change in dates. If there is no citation or the citation is to a off-line resource or a deadlink, it may be impossible to tell if the changed information is a genuine correction or a deliberately damaging action. Obviously I may have my suspicions, but I do have the obligation to Assume Good Faith. It’s not easy.
Kerry
Sent from my iPad
On 16 Jan 2019, at 9:03 pm, Thomas Stieve tomthirteen@email.arizona.edu
wrote:
Dear Listserv,
Hope all is well. I am mapping IP address edits per country for 271 language Wikipedias. I would like to exclude IP addresses that are vandalism. I was thinking of using the ipblocks table for the IP
addresses
to be excluded. Because this project is in so many different languages
and
my programming skills are intermediate, I would like to use the Wikipedia tables or registers that the Wikipedians in those language use to mark vandalism. If anyone has another idea, I would be most grateful. Perhaps
I
am missing a way that Wikipedians across languages are using to mark vandalism.
Thank you, Tom
-- Thomas Stieve Ph.D. Candidate School of Geography and Development University of Arizona _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Tom,
maybe take a look here: https://webis.de/publications.html#filter:ICWSM
In this paper we conducted a large-scale analysis of vandalism on Wikipedia across many languages, and the undeyling software is available.
Best, Martin
On Wed, Jan 16, 2019 at 8:03 PM Thomas Stieve tomthirteen@email.arizona.edu wrote:
Dear Listserv,
Hope all is well. I am mapping IP address edits per country for 271 language Wikipedias. I would like to exclude IP addresses that are vandalism. I was thinking of using the ipblocks table for the IP addresses to be excluded. Because this project is in so many different languages and my programming skills are intermediate, I would like to use the Wikipedia tables or registers that the Wikipedians in those language use to mark vandalism. If anyone has another idea, I would be most grateful. Perhaps I am missing a way that Wikipedians across languages are using to mark vandalism.
Thank you, Tom
-- Thomas Stieve Ph.D. Candidate School of Geography and Development University of Arizona _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org