On 11-01-05 02:09 AM, Daniel Kinzler wrote:
On 05.01.2011 05:25, Jay Ashworth wrote:
I believe the snap reaction here is "you haven't tried to diff XML, have you?
A text-based diff of XML sucks, but how about a DOM based (structural) diff?
-- daniel
I don't think a discussion on diff comparison of XML has much point.
I believe the idea floating around here (or at least the idea I'm thinking of based on these discussions) is that we would store page text in an xml format or a serialized php format or something else where contents are semantically noted with things like '<template title="Template:Foo"><param name="1">...</param><param name="foo">bar</param></template><i>This is italic</i><link internal="true" title="FooBar">FooBar</link>', to actually edit this page content we provide the data in multiple formats: - Fully parsed output for page viewing - A semantically marked up version of the html that is compatible with the use of a WYSIWYG editor and can be converted back to the xml format and then saved - A WikiText like format similar to the WikiText we already have that users can edit in plaintext, we use the xml and covert it into that format, and then when the user saves parse that back into the xml format.
Naturally, if we're doing things like this, then rather than diffing the ugly xml, the natural thing would most likely be to take the xml format of both pages, convert it into that WikiText-like plaintext format and show the user a diff of that so they know what meaningful changes were made to the page. If you really wanted to, you could also show them a diff of the end html as an option, but that's fairly pointless.
As an extra bonus, besides enabling WYSIWYG, having that xml format also has a good chance of making efforts of giving users an in-page diff marking up what was actually changed in the contents itself much easier.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]