2008/3/23, Simetrical Simetrical+wikilist@gmail.com:
On Sat, Mar 22, 2008 at 6:28 PM, Guy Van den Broeck guyvdb@gmail.com wrote:
I want to get some feedback on a possible Summer of Code project
proposal.
For last year's GSoC I created an HTML diffing library for Daisy CMS.
The
algorithm has proven to work well and I'm thinking of porting it to mediawiki.
What the algorithm does is take the source of 2 pages and merge them to visualize the diff. The code I have already does something like this: http://users.pandora.be/guyvdb/wikipediadiff.jpg
Is this a feasible project for wikimedia? I'm personally not very
impressed
with the current "diff pages". I think a visual diff would bring that
part
of mediawiki up to par with the rest of the software.
I agree that inline diffs would be nicer, instead of side-by-side. Having it an HTML-rendered diff instead of a wikitext diff is useful to some extent, but it hides information. It seems like it would be relatively difficult to convey the fact that templates or images were changed, for instance, and things like comments (which must be included in diffs for proper usability) would also be an issue. Some mechanism would have to be devised to convey that such invisible changes took place. Possibly you could have an option to do a wikitext diff instead, but that doesn't seem ideal to me. Doing it one way that works well for everyone would be best if possible.
As for performance, please note that Wikimedia uses a diff engine written in C++. One written in PHP would probably not be acceptable on Wikipedia, from past experience (diffing used to eat a huge amount of CPU). Scalability is also important, within reason: [[George W. Bush]] is 128 KiB, for instance.
Actually images are handled rather well: http://cocoondev.org/daisy/index/version/12/diff?&otherDocumentId=2-cd&a... note that the image overlays are probably wrong on safari but in principle it works for images.
Templates and for instance table changes are handled to. In Daisy we chose to display a tooltip window with an interpretation of the underlying HTML changes. I'm sure we can find something similar tailored for the needs of mediawiki.
If I start working on the HTML diff then I might as well add a word-for-word source diff like I did for Daisy: http://cocoondev.org/daisy/index/version/12/diff?&otherDocumentId=2-cd&a... It suffers from the same performance penalty as the HTML diff but it conveys all information present.
With respect to performance I think there are a lot of option. We can fall back on a simpler diff when the filesize or execution time exceeds a certain number, or the HTML diff can be an extra (experimental) link on the current diff page. In general, I don't think the performance concern should hold back this project. Once we have the optimized html diff code we can decide how and when to integrate it.