I like this idea of preserving basic information in
case of a
catastrophe. Actually, someone thought about the content on the
french wikipedia, but for now, not much work have been given on it.
Longevity, just like any human project, can only be reached by
building on previous experience. Before you can send a rocket to the
moon, you will have to build the first rocket, then send the first
rocket into the stratosphere, then send a rocket into orbit.
For preserving digital information, there is a lot of experience out
there, especially in maintaining software source code over long time.
These projects are so many that you can talk already of a "natural
selection" among survival strategies. The most commonly cited are
"Keep it Simple, Stupid" (KISS) and "Lots of Copies Keep Stuff Safe"
(LOCKSS). Proprietary source code that exists in only one copy can
survive only if the company survives, which requires profitable and
slowly emerging areas of business. Open source code can survive even
if the company goes bankrupt. But survival also requires relevancy.
Irrelevant software is not maintained, and easily forgotten during
emergencies or relocation. If it goes away, nobody cares to ask for
it. Dependence on complicated and immature tools (programming
languages, version control systems, filesystems, database formats) can
cause the loss of source code.
Good examples of long-lived software is the GNU Emacs text editor (18
years?) and the Linux operating system kernel (12 years?). Both are
still actively used, developed, and maintained.
The idea to engrave Wikipedia contents on physical media fails to meet
many of these criteria. For example, nobody uses such media in
everyday life and the knowledge of how to use them is not widespread.
If a mistake is made in the engraving, such that it is impossible to
read the contents back, very few people are able to detect this
mistake. There is no previous experience in rescueing information
from such engravings.
Paper print-out is a little better than engraving, because many people
have the knowledge of how to read from paper. There is also plenty of
experience (several centuries) from long-term preservation of
(acid-free) paper. However, the error rates from scanning and OCR are
such that restoration might be difficult or impractical. Paper might
be best suited for a printed Wikipedia 1.0 that serves as a printed
encyclopedia, since that printout has a use in itself, besides
preserving the contents. Paper might be less suitable for preserving
the edit history of every article, since nobody would ever read that
other than for restoration.
During my 22 years of programming, I have shifted operating systems
and programming languages many times, but only once (in 1990) have I
shifted character sets (from ASCII to ISO 8859-1). Soon I will shift
again to Unicode/UTF-8, which I hope to use for the rest of my life.
If I had saved them and copied them to my next computer, I could still
read the plain text files I wrote in 1982. Actually, I still keep
some print-outs from 1984 and I should retype them before the ink
fades away. I still have saved e-mails from 1986 on my current disk.
Wikipedia's current method of distributing digital dumps of the entire
database, in combination with keeping the project current and
relevant, is the best survival strategy I can think of. Since it has
now survived 4 years, we can hope that it will survive 10 years. And
when that is reached, we can hope that it will survive for 10 more.
I have a website (runeberg.org
) that dates back to 1992. Actually it
was on Gopher first, but Gopher sites are not in much use today and
nobody cares to preserve them. My site migrated to WWW in 1993-94.
The pages are kept under RCS and it is *great fun* to be able to trace
more than 10 years of RCS history online,
Lars Aronsson (lars(a)aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/