Tim Starling wrote:
MediaWiki's diff code was mostly written by Geoffrey Dairiki, for PhpWiki. It was imported into MediaWiki by Magnus and has since been modified by a few MediaWiki developers. It's a lowest-common subsequence (LCS) algorithm, and is "mostly lifted" from a Perl module by Ned Konz, now called Algorithm::DiffOld. It's called DiffOld because the new Algorithm::Diff is faster. Maybe if someone's feeling ambitious, they could port the new perl module. I was wondering if those string comparisons were the slow part, and it seems the author of the new Algorithm::Diff package found they were.
As far as I could see Algorithm::Diff uses the classic longest-common subsequence algorithm by Dan Hirschberg with O(n*m) where n and m is the length of each text and linear space: http://www.ics.uci.edu/~eppstein/161/960229.html
There is also an algorithm by Eugene W. Myers with O((n+m)*D) where D is the size of the difference between the texs: http://citeseer.ist.psu.edu/myers86ond.html
Davide Libenzi implemented it in C: http://www.xmailserver.org/xdiff-lib.html
Maybe diffs are not the most important performance issue but surely they can be speed up.
Greetings, Jakob