Re: [Wikitech-l] Revision history edit compressor

8 Jan 2009

Anthony wrote:
...
  Using skip-deltas I think you could make a system fast
enough to run live.
 At the very least it could be used as part of an incremental dump system.
 Using *smart* skip-deltas, you'd resolve the inefficiencies due to
 page-blanking vandalism. 
One more possibility is to make md5 of every revision, then diff only 
between those that have unique md5s.

...
  One improvement over the diff format used by RCS would
be to use smarter
 breakpoints, since wikitext tends to have a lot of really long lines with no
 line breaks.  Using some simple heuristics to guess at sentence breaks would
 probably be useful there.  It wouldn't have to be perfect, since 
I suggest looking into wdiff ( http://www.gnu.org/software/wdiff/ ).

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Revision history edit compressor