Tim Starling wrote:
MediaWiki's diff code was mostly written by
Geoffrey Dairiki, for
PhpWiki. It was imported into MediaWiki by Magnus and has since been
modified by a few MediaWiki developers. It's a lowest-common subsequence
(LCS) algorithm, and is "mostly lifted" from a Perl module by Ned Konz,
now called Algorithm::DiffOld. It's called DiffOld because the new
Algorithm::Diff is faster. Maybe if someone's feeling ambitious, they
could port the new perl module. I was wondering if those string
comparisons were the slow part, and it seems the author of the new
Algorithm::Diff package found they were.
As far as I could see Algorithm::Diff uses the classic longest-common
subsequence algorithm by Dan Hirschberg with O(n*m) where n and m is the
length of each text and linear space:
http://www.ics.uci.edu/~eppstein/161/960229.html
There is also an algorithm by Eugene W. Myers with O((n+m)*D) where D is
the size of the difference between the texs:
http://citeseer.ist.psu.edu/myers86ond.html
Davide Libenzi implemented it in C:
http://www.xmailserver.org/xdiff-lib.html
Maybe diffs are not the most important performance issue but surely they
can be speed up.
Greetings,
Jakob