elwp(a)gmx.de wrote:
The background of my
question is that I have written a Perl program that compresses page
histories much better than the currently used algorithm. And now I
want to write PHP code so that MediaWiki can access the data. But
HistoryBlobStubs make this more complicated.
This is how my method works: All revision texts are split into sections
(the delimiter is "\n=="). Unchanged sections are stored only once.
Sections are sorted by their headings. Then everything is compressed
with deflate().
Two questions spring to mind:
Firstly, when you say "unchanged sections are stored only once", does
this apply even if someone changes a section and someone else reverts
it, or if someone copies a section to another page? Maybe all the pages
should be split into sections, and all the sections stored individually?
Secondly, how great will the dependence between a revision and the
previous revision be? In other words, how many (compressed) revisions
will have to be retrieved in order to reconstruct the (uncompressed)
text of just one revision?
Timwi