elwp@gmx.de wrote:
Not only history blobs can benefit from splitting revision texts into sections and sorting them. The sizes of XML exported pages (with complete page histories) can also be reduced.
This is the current structure (only relevant tags):
<page> <revision><text>text0</text></revision> <revision><text>text1</text></revision> </page>
This would be the new structure:
<page> <section>sectiontext0</section> <section>sectiontext1</section> <section>sectiontext2</section> <revision><text type="sectionlist">0 1</text></revision> <revision><text type="sectionlist">0 2</text></revision> </page>
Can you show that this does significantly better than gzip? Certainly it won't simplify dump processing.
-- brion vibber