Hi both,

Many thanks for all the help!
All the information was really helpful.

Best wishes,
Elisavet

Στις Δευ, 14 Σεπ 2020 στις 3:32 μ.μ., ο/η Isaac Johnson <isaac@wikimedia.org> έγραψε:
Hi Elisavet,

Reverts:
Sumit pointed you in the right direction. In case you're curious about types of reverts beyond identity reverts, this phab task has an excellent write-up / analysis of how to detect the different types: https://phabricator.wikimedia.org/T252366

Deletions:
I'm not fully sure what you are referring to here, but if it's:
* revision deletion: edits that have been deleted will show up as deleted in the XML history dumps. You can also find more details in the deletion log (table details; example XML dump)
* blanking of content: no fool-proof automatic way to detect when editors are deleting full sections/pages, but there are a few options. You can inspect edit comments and page diffs for evidence of blanking, but an easier place to start is probably with automatic edit tags (e.g., mw-blank). Here's the full list of tags and dump of tag types (table; example XML dump) and all tags applied (table; example XML dump). I'm not familiar with Wikidata tags so you probably want to do some examination of what they're actually detecting to make sure it's what you are looking for before you rely on them for analysis.

Best,
Isaac

On Fri, Sep 11, 2020 at 8:48 PM Sumit Asthana <asthana.sumit23@gmail.com> wrote:
Hi Elisavet,

You can identify reverts using the sha1 checksum of revisions You can use the mwreverts library[0] to do that in the dump. Editquality[1] repository has such a use case for detecting reverts. You will not be able to detect partial reverts but it will detect identity reverts which form majority of the reverts.


On Fri, Sep 11, 2020 at 2:55 AM Elisavet Koutsiana <elisavetkoutsiana@gmail.com> wrote:
Hello, 

I wanted to ask if there is any canonical way to identify deletion, reverts etc in the edit history xml files. I can understand that the action of every revision is described in the "comment" element of the xml format, but is there a code name or number or anything else that will help me to identify one revision for example as deletion?

Thank you,
Elisavet  
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata