Having XML-based content would also enable a wide variety of new re-uses of Wikimedia content. People could build all sorts of custom apps, games, feeds, etc., without having to worry about broken syntax or resorting to screen scraping (like we do for our mobile site). It would also make implementing semantic features easier and thus could improve our search capabilities. Plus it makes a great Bloody Mary!
Ryan Kaldari
On 1/5/11 8:26 AM, Daniel Friesen wrote:
On 11-01-05 02:09 AM, Daniel Kinzler wrote:
On 05.01.2011 05:25, Jay Ashworth wrote:
I believe the snap reaction here is "you haven't tried to diff XML, have you?
A text-based diff of XML sucks, but how about a DOM based (structural) diff?
-- daniel
I don't think a discussion on diff comparison of XML has much point.
I believe the idea floating around here (or at least the idea I'm thinking of based on these discussions) is that we would store page text in an xml format or a serialized php format or something else where contents are semantically noted with things like '<template title="Template:Foo"><param name="1">...</param><param name="foo">bar</param></template><i>This is italic</i><link internal="true" title="FooBar">FooBar</link>', to actually edit this page content we provide the data in multiple formats:
- Fully parsed output for page viewing
- A semantically marked up version of the html that is compatible with
the use of a WYSIWYG editor and can be converted back to the xml format and then saved
- A WikiText like format similar to the WikiText we already have that
users can edit in plaintext, we use the xml and covert it into that format, and then when the user saves parse that back into the xml format.
Naturally, if we're doing things like this, then rather than diffing the ugly xml, the natural thing would most likely be to take the xml format of both pages, convert it into that WikiText-like plaintext format and show the user a diff of that so they know what meaningful changes were made to the page. If you really wanted to, you could also show them a diff of the end html as an option, but that's fairly pointless.
As an extra bonus, besides enabling WYSIWYG, having that xml format also has a good chance of making efforts of giving users an in-page diff marking up what was actually changed in the contents itself much easier.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]