Hi folks,
given a difflink between two states of a page, is there a way to tell by bot, how many modifications there are? I.e. in how many fragments those two differ? I know, this is not an exact question, but the software itself interprets it somehow as it tries to emphasize differences with colors.
Hi,
Short answer: The API does not seem to directly provide such info, but you could use action=compare, then search for class="diff-lineno" and divide by 2 (it appears once on each side of the diff).
Do note that the question you ask is not very well defined. The method above will give you the number of chunks the diff engine displays, which can be different from what you consider as a modification. For instance https://ro.wikipedia.org/w/api.php?action=compare&fromrev=9648318&to... shows 1 chunk, but there is a modified line and an added line (the category), which you might consider as 2 different modifications. You need to clearly set the expectations before searching for a solution.
Strainu
2016-08-03 0:23 GMT+03:00 Bináris wikiposta@gmail.com:
Hi folks,
given a difflink between two states of a page, is there a way to tell by bot, how many modifications there are? I.e. in how many fragments those two differ? I know, this is not an exact question, but the software itself interprets it somehow as it tries to emphasize differences with colors.
-- Bináris _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2016-08-02 23:44 GMT+02:00 Strainu strainu10@gmail.com:
Do note that the question you ask is not very well defined.
Yes, I know that, but it is still a good way for lower estimation. The real number is at least as big, as the number we get with your method. Thank you!
On Wed, Aug 3, 2016 at 4:44 AM, Strainu strainu10@gmail.com wrote:
Hi,
Short answer: The API does not seem to directly provide such info, but you could use action=compare, then search for class="diff-lineno" and divide by 2 (it appears once on each side of the diff).
Do note that the question you ask is not very well defined. The method above will give you the number of chunks the diff engine displays, which can be different from what you consider as a modification. For instance https://ro.wikipedia.org/w/api.php?action=compare&fromrev=9648318&to... shows 1 chunk, but there is a modified line and an added line (the category), which you might consider as 2 different modifications. You need to clearly set the expectations before searching for a solution.
fwiw, Pywikibot has an APISite method "compare" to fetch the MediaWiki diff
https://doc.wikimedia.org/pywikibot/api_ref/pywikibot.html#pywikibot.site.AP...
and another function "html_comparator" to convert the MediaWiki diff into something a bit more usable for simple uses
https://doc.wikimedia.org/pywikibot/api_ref/pywikibot.html#pywikibot.diff.ht...
We have one unit test that puts these two functions together in a hopefully understandable fashion
https://github.com/wikimedia/pywikibot-core/blob/master/tests/diff_tests.py#...
On Tue, Aug 2, 2016 at 2:23 PM, Bináris wikiposta@gmail.com wrote:
given a difflink between two states of a page, is there a way to tell by bot, how many modifications there are? I.e. in how many fragments those two differ? I know, this is not an exact question, but the software itself interprets it somehow as it tries to emphasize differences with colors.
Unless you need to reproduce MediaWiki's exact method of computing diffs (which is actually configuration-dependent), there is no point in relying on the API. Just fetch the two revisions and calculate the distance yourself. The article about edit distance [0] might help you refine the question.
If you are looking for a pywikibot-based solution, editdistance [1] seems like the right library for the job.
[0] https://en.wikipedia.org/wiki/Edit_distance [1] https://pypi.python.org/pypi/editdistance
wikitech-l@lists.wikimedia.org