Re: [Wikitech-l] Re: history compression

28 May 2005

Timwi:
...
  Two questions spring to mind:

 Firstly, when you say "unchanged sections are stored only once", does 
 this apply even if someone changes a section and someone else reverts 
 it, 
Yes, if both revision texts reside in the same history blob. Up to
20 consecutive revisions are stored in one blob.

...
  or if someone copies a section to another page?

No.

...
  Maybe all the pages 
 should be split into sections, and all the sections stored individually? 
I doubt that this would improve the compression much, because texts aren't
copied that often.

...
  Secondly, how great will the dependence between a
revision and the 
 previous revision be? In other words, how many (compressed) revisions 
 will have to be retrieved in order to reconstruct the (uncompressed) 
 text of just one revision? 
The complete history blob must be decompressed of course. But no previous
revisions need to be reconstructed. At the beginning of the uncompressed
history blob there is a section index for each revision followed by a list
of (position, length)-pairs for each section. So when a revision text is
to be extracted, this is what happens:
* uncompress history blob
* look up section list for the requested revision
* loop up section offsets and lengths
* concatenate sections

This is an example header (first 20 revisions of the german article
"Stern"):

00000020 00000025 00000142 00000260 00000001 # 20 revisions, 25 different
sections
0 # first revision has no heading: only one section
1 2 3
4 5 6
4 5 6 # conversion script: nothing changed
7 5 8
7 9 8
10 9 8
7 9 8
7 11 8
12 11 8
12 13 8
12 14 8
15 14 8
16 14 8
17 14 8
18 14 8
19 14 8
19 20 8
21 22 23
21 24 23
0 2579 # offset and length of the first section
2579 1176
...

-- 
Weitersagen: GMX DSL-Flatrates mit Tempo-Garantie!
Ab 4,99 Euro/Monat: http://www.gmx.net/de/go/dsl

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: history compression