Re: [Wikitech-l] Revision history edit compressor

8 Jan 2009


      On Wed, Jan 7, 2009 at 6:25 PM, Keisial keisial@gmail.com wrote:
...
Anthony was doing some similar work in October, based on RCS and reverse
deltas. Results also reduced the sizes notably.
http://blog.p2pedia.org/2008/10/anarchism.html
Using skip-deltas I think you could make a system fast enough to run live.
At the very least it could be used as part of an incremental dump system.
Using *smart* skip-deltas, you'd resolve the inefficiencies due to
page-blanking vandalism.
I never managed to create an implementation, though.  In particular, I
[oddly] couldn't find a standalone diff program to make the deltas in RCS
format, and never got around to writing my own.
One improvement over the diff format used by RCS would be to use smarter
breakpoints, since wikitext tends to have a lot of really long lines with no
line breaks.  Using some simple heuristics to guess at sentence breaks would
probably be useful there.  It wouldn't have to be perfect, since
reconstruction just concatenates all the text together regardless of where
it was broken apart.  Of course, results would be somewhat language
specific.
I like Robert's approach, though.  I'd love to see the results of applying
the concept of skip-deltas to it, to achieve O(lg(N)) random access reads
(with N being the number of revisions to the article being accessed).  Or
maybe I'm misreading him and he's already accomplished that?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Revision history edit compressor