On Sat, Mar 22, 2008 at 6:28 PM, Guy Van den Broeck guyvdb@gmail.com wrote:
I want to get some feedback on a possible Summer of Code project proposal. For last year's GSoC I created an HTML diffing library for Daisy CMS. The algorithm has proven to work well and I'm thinking of porting it to mediawiki.
What the algorithm does is take the source of 2 pages and merge them to visualize the diff. The code I have already does something like this: http://users.pandora.be/guyvdb/wikipediadiff.jpg
Is this a feasible project for wikimedia? I'm personally not very impressed with the current "diff pages". I think a visual diff would bring that part of mediawiki up to par with the rest of the software.
I agree that inline diffs would be nicer, instead of side-by-side. Having it an HTML-rendered diff instead of a wikitext diff is useful to some extent, but it hides information. It seems like it would be relatively difficult to convey the fact that templates or images were changed, for instance, and things like comments (which must be included in diffs for proper usability) would also be an issue. Some mechanism would have to be devised to convey that such invisible changes took place. Possibly you could have an option to do a wikitext diff instead, but that doesn't seem ideal to me. Doing it one way that works well for everyone would be best if possible.
As for performance, please note that Wikimedia uses a diff engine written in C++. One written in PHP would probably not be acceptable on Wikipedia, from past experience (diffing used to eat a huge amount of CPU). Scalability is also important, within reason: [[George W. Bush]] is 128 KiB, for instance.