Re: [Wikitech-l] Re: Differencing Engine

14 Sep 2005


      On Tue, Sep 13, 2005 at 11:44:41AM +0200, Jakob Voss wrote:
...
...
MediaWiki's diff code was mostly written by Geoffrey Dairiki, for
PhpWiki. It was imported into MediaWiki by Magnus and has since been
modified by a few MediaWiki developers. It's a lowest-common subsequence
(LCS) algorithm, and is "mostly lifted" from a Perl module by Ned Konz,
now called Algorithm::DiffOld. It's called DiffOld because the new
Algorithm::Diff is faster. Maybe if someone's feeling ambitious, they
could port the new perl module. I was wondering if those string
comparisons were the slow part, and it seems the author of the new
Algorithm::Diff package found they were.
As far as I could see Algorithm::Diff uses the classic longest-common 
subsequence algorithm by Dan Hirschberg with O(n*m) where n and m is the 
length of each text and linear space:
http://www.ics.uci.edu/~eppstein/161/960229.html
Quick field report:
I tried to use Algorithm::Diff on split("",$str)'ed contents of 2 revisions,
and for bigger (like 30kB) pages it takes a few minutes. This is weird.
How is MediaWiki doing it ? Does it try to find longest common subsequence
of lines or characters ?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: Differencing Engine