The new history compression class is finished now. I have entered it in Bugzilla (ID 2310). The average compression is about 5 times better than that of the currently used class. Access to individual revisions is generally faster (factor 2 or so). Only if all revisions in a history blob are read (e.g. when a page is exported), my class is slower, especially when it has a large number of sections. But this doesn't yet account for the time needed to load the history blobs. I've tested the new class with [[de:Wikipedia:Löschkandidaten/5. Februar 2005]] which is a typical large discussion page with more than 50 headings. Consecutive access to all revisions takes about 0.5 seconds with ConcatenatedGzipHistoryBlobs and 1.4 seconds with SplitMergeGzipHistoryBlobs. On the other hand 58 ConcatenatedGzipHistoryBlobs are needed with a total length of 5937kB whereas 21 SplitMergeGzipHistoryBlobs with a total length of 508kB can hold the same text. Maybe the time difference for loading the blobs fully compensates for the slower read access. What do you think?
wikitech-l@lists.wikimedia.org