Re: [Wikitech-l] old

2 Feb 2004


      Magnus Manske wrote:
...
AFAIK, it is still the same mechanism as in Phase II, but Brion works on
on-the-fly compression for the old texts, as the old table gets quite large.
Magnus
The big win is not in compressing each individual version (version 6
of the article "London"), but in compressing the entire sequence of
versions for each article, since so much is common between version 6,
7, and 8 of the same article.
This optimization is what RCS does by storing the current text in full
and only the diffs that are needed to reproduce the next earlier
version.  Now, RCS has its roots in the 1970s and does this (1) in a
text file, and (2) in a long sequence of diffs, which makes it very
slow to extract version 1 of a text if the current version is 2314.
I think that some of the more modern version control systems (??
aegis, arch, bitkeeper, darcs, perforce, subversion, ??) play around
with hierarchical systems where every N:th version is stored in full.
Still, when restoring vandalism, version 6 and 8 might be identical,
so storing the two diffs (back and forth) would be less than optimal.
Further, when pieces of text are moved between two articles, the best
compression would have to consider the whole table.  Perhaps MySQL
(or the underlying filesystem) should implement the compression.
I don't know if any existing version control system uses a relational
database backend (MySQL, PostgreSQL, ...), but this would be an
interesting combination independent of Wikipedia, so perhaps it should
be develped as a generic component that can be used from Wikipedia as
well as from other applications.  Especially the way MediaWiki stores
the changelog in a searchable relational table is a great improvement
over primitive file-based systems such as RCS and CVS.
-- 
  Lars Aronsson (lars@aronsson.se)
  Aronsson Datateknik - http://aronsson.se/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] old