2011/1/17 Aryeh Gregor Simetrical+wikilist@gmail.com:
Wikimedia stores diffs using delta compression, so actually this is basically what happens. The size of the edit is what determines the size of the stored diff, not the size of the page. (I don't know how this works in detail, though.) IIRC, default MediaWiki doesn't work this way.
Wikimedia doesn't technically use delta compression. It concatenates a couple dozen adjacent revisions of the same page and compresses that (with gzip?), achieving very good compression ratios because there is a huge amount of duplication in, say, 20 adjacent revisions of [[Barack Obama]] (small changes to a large page, probably a few identical versions to due vandalism reverts, etc.). However, decompressing it just gets you the raw text, so nothing in this storage system helps generation of diffs. Diff generation is still done by shelling out to wikidiff2 (a custom C++ diff implementation that generates diffs with HTML markup like <ins>/<del>) and caching the result in memcached.
Roan Kattouw (Catrope)