[Pywikipedia-l] [ pywikipediabot-Bugs-1804008 ] showDiff() highlighting limitation due to difflib design

SourceForge.net noreply at sourceforge.net
Sat Apr 11 09:31:49 UTC 2009


Bugs item #1804008, was opened at 2007-09-28 09:35
Message generated for change (Comment added) made by nicdumz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1804008&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Francesco Cosoleto (cosoleto)
Assigned to: Francesco Cosoleto (cosoleto)
Summary: showDiff() highlighting limitation due to difflib design

Initial Comment:
showDiff() can fail to highlight a char-by-char difference because Python difflib seems don't support fully char-by-char comparison. 

Please see in Python tracker:

* issue #1528074: "difflib.SequenceMatcher.find_longest_match()  wrong result" (http://bugs.python.org/issue1528074)

* issue #1678345: "A fix for the bug #1528074 [warning: quite slow]" (http://bugs.python.org/issue1678345)

----------------------------------------------------------------------

>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-11 11:31

Message:
Actually, I'd very much like to see better diff support for pywikipedia. I
dont know why I missed that bug =)

I see in those bugs several comments about complexity changes, saying that
a patch could change complexity from O(n*m) to O(n+m), which certainly
looks interesting. If char-by-char comparison provides better diffs, at a
lower cost, what exactly is the reason for not supporting in Python? :s

Two things to look at during implementation:
* Would it provide interesting diffs for all cases? (if one case is
improved while other matches get worse, it's not so interesting anymore)
* Performance changes for big diffs.

Good luck =)

----------------------------------------------------------------------

Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-11 11:18

Message:
Assigned before somebody certainly steals this issue to me. I am going to
add a modified difflib version. Unless the lack of feature is fixed in
recent Python builds or, of course, anyone makes an objection. I am not
sure about a config option to enable or disable line-by-line/char-by-char
comparision.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2008-01-25 22:38

Message:
Logged In: NO 

Guess this is an example
http://bildr.no/view/146822

----------------------------------------------------------------------

Comment By: Francesco Cosoleto (cosoleto)
Date: 2007-09-28 09:38

Message:
Logged In: YES 
user_id=181280
Originator: YES

File Added: difflib_test.py

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1804008&group_id=93107



More information about the Pywikipedia-l mailing list