elwp@gmx.de wrote:
The background of my question is that I have written a Perl program that compresses page histories much better than the currently used algorithm. And now I want to write PHP code so that MediaWiki can access the data. But HistoryBlobStubs make this more complicated.
This is how my method works: All revision texts are split into sections (the delimiter is "\n=="). Unchanged sections are stored only once. Sections are sorted by their headings. Then everything is compressed with deflate().
Two questions spring to mind:
Firstly, when you say "unchanged sections are stored only once", does this apply even if someone changes a section and someone else reverts it, or if someone copies a section to another page? Maybe all the pages should be split into sections, and all the sections stored individually?
Secondly, how great will the dependence between a revision and the previous revision be? In other words, how many (compressed) revisions will have to be retrieved in order to reconstruct the (uncompressed) text of just one revision?
Timwi