[Wikitech-l] Re: Differential storage

12 Sep 2005

On Mon, 12 Sep 2005 23:40:55 +1000, Tim Starling wrote:

[...]
...
  No, these possibilities have not been rigorously
examined. Note that
 those aren't really reasons. They're illustrative only, addressing each
 of them in turn does not guarantee that your compression algorithm is
 effective. I was just describing my train of thought in arriving at the
 idea that LZ77 might be worth a try.  If you have your own idea, please
 download a dump and try it out. 
I have ideas, I may do that at some point.  My instinct is that a diff
scheme that took account of intra-line changes as well as block text moves
would result in the most compression; and that if that alone didn't, then
compressing that diff scheme in the way that the entire revision set
currently is, would.  The question I haven't delved into is: how difficult
would it be to do that and how computation-intensive compared to what is
currently done?

...
  The main thing which put me off implementing
diff-based compression was
 the complexity, in particular the required schema change. If you need to
 load some large number of diffs in order to generate a revision, those
 diffs need to be loaded in a single database query, if any kind of
 efficiency is to be reached. 
Yes, I realise that.

...
  In other words, don't do a proof of principle and
then nag me to write
 the real thing, as if that were the easy part. 
To start with I want to get an idea of what's currently done and why, and
any ideas previously proposed and/or rejected and why.  I understand that
the main wikis are huge and that performance issues are important.

-- 
http://members.dodo.com.au/~netocrat

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: Differential storage