I believe that Ellery's work used my mwdiffs library which is largely based
On Sun, Sep 3, 2017 at 2:54 PM, Pinkesh Badjatiya <
I was exploring the dataset shared in the Wikipedia Detox
project. I was trying to use the similar diff logic to obtain the changes
from a page using *revid* but realized that the Wikipedia API provides only
the diff of the revision with its earlier version. I am able to fetch the
diffs for a set of *revids* using the Wikipedia API, but I am unable to
extract only the changed sentences in the revision. I found this
script from the project source files that contain bits of what might have
been used in the actual data collection process to obtain the changes from
the Talk pages, but I am unable to figure out the high-level information
such as input/output formats etc.
Can anyone provide a solution to this or any suggestions on how to proceed?
Also, It would be really beneficial if I could use the same diff logic as
used by the original authors to ensure consistency.
Meanwhile, I have asked a similar question on StackOverflow
emailed the original Wikimedia author of the paper.
Wiki-research-l mailing list