elwp@gmx.de (elwp@gmx.de) [050606 17:44]:
The window size of the deflate function is the main cause for this huge difference. Its maximum value is 32kB, but many pages - especially discussion pages - are larger. So you must bring matching regions closer together. Splitting files by section and sorting sections of several revisions by section heading does exactly this. (And additionally one can omit unchanged sections.)
You mean, something closer to keeping diffs rather than each revision in its entirety? Have you tested using diff rather than this custom diff-like thing?
- d.