On Mon, 12 Sep 2005 22:51:28 +0200, Tomasz Wegrzanowski wrote:
Maybe download a dump and try with xdelta ?
Looks promising. The only drawback that I can see is that it stores an md5 sum which for very small changes can make it less space efficient than ordinary diff in rcs format and is just plain unnecessary for mediawiki. I'll see if there's a way to disable the md5 sum; perhaps the source will need to be hacked.
Now I have a David and Goliath problem... 56k dial-up vs 31G xml download. Can anyone suggest a source for a smaller data set in English with some representative multiple-revision articles, preferably a few edit wars etc.