Hi both,
Many thanks for all the help!
All the information was really helpful.
Best wishes,
Elisavet
Στις Δευ, 14 Σεπ 2020 στις 3:32 μ.μ., ο/η Isaac Johnson <isaac(a)wikimedia.org>
έγραψε:
Hi Elisavet,
Reverts:
Sumit pointed you in the right direction. In case you're curious about
types of reverts beyond identity reverts, this phab task has an excellent
write-up / analysis of how to detect the different types:
https://phabricator.wikimedia.org/T252366
Deletions:
I'm not fully sure what you are referring to here, but if it's:
* revision deletion
<https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Revision_deletion>:
edits that have been deleted will show up as deleted in the XML history
dumps. You can also find more details in the deletion log (table details
<https://www.mediawiki.org/wiki/Manual:Logging_table>; example XML dump
<https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-pages-logging.xml.gz>
)
* blanking of content: no fool-proof automatic way to detect when editors
are deleting full sections/pages, but there are a few options. You can
inspect edit comments and page diffs for evidence of blanking, but an
easier place to start is probably with automatic edit tags (e.g.,
mw-blank). Here's the full list of tags
<https://www.wikidata.org/wiki/Special:Tags> and dump of tag types (table
<https://www.mediawiki.org/wiki/Manual:Change_tag_def_table>; example XML
dump
<https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-change_tag_def.sql.gz>)
and all tags applied (table
<https://www.mediawiki.org/wiki/Manual:Change_tag_table>; example XML dump
<https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-change_tag.sql.gz>).
I'm not familiar with Wikidata tags so you probably want to do some
examination of what they're actually detecting to make sure it's what you
are looking for before you rely on them for analysis.
Best,
Isaac
On Fri, Sep 11, 2020 at 8:48 PM Sumit Asthana <asthana.sumit23(a)gmail.com>
wrote:
Hi Elisavet,
You can identify reverts using the sha1 checksum of revisions You can use
the mwreverts library[0] to do that in the dump. Editquality[1] repository
has such a use case for detecting reverts. You will not be able to detect
partial reverts but it will detect identity reverts which form majority of
the reverts.
- Regards
Sumit Asthana
[0] -
https://pythonhosted.org/mwreverts/
[1] -
https://github.com/wikimedia/editquality/blob/master/editquality/utilities/…
On Fri, Sep 11, 2020 at 2:55 AM Elisavet Koutsiana <
elisavetkoutsiana(a)gmail.com> wrote:
Hello,
I wanted to ask if there is any canonical way to identify deletion,
reverts etc in the edit history xml files. I can understand that the action
of every revision is described in the "comment" element of the xml format,
but is there a code name or number or anything else that will help me to
identify one revision for example as deletion?
Thank you,
Elisavet
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata