On Mon, Aug 28, 2006 at 10:52:42PM -0400, Simetrical wrote:
On 8/28/06, Jay R. Ashworth jra@baylink.com wrote:
Because in wikitext, everything is in-band; in XML, the structure is out-of-band, on purpose. This requires an entirely different, and I suspect, much more complicated diff algorithm.
I don't know what "in-band" and "out-of-band" mean ([[Out of band]] doesn't help either),
The current diff engine, with which I'm not familiar intimately (read that as I haven't looked at the code at all, but I'm assuming it's somewhat familiar with the Unix diff internals) is working on one big object of stream text. The structural markup is *part* of that stream of text, hence, in-band.
but if the diff engine parses the XML, it can
look for a) changes in structure/markup and b) changes in content.
Yep, and those will interact in ways different from the ways that they do now: the current diff engine need not "trip over" the edges of objects in the way that an XML parser will have to.
Either one should be very easy and fast to diff, given XML-parsing library functions (for the C++ module used on WMF sites, that is). Faster than present, I don't know, but the present differ is hardly a bottleneck.
Certainly. I wasn't suggesting that it was; rather, the opposite.
Anyone got any implementation experience with diffing XML trees?
Cheers, -- jra