Anthere wrote:
I like this idea of preserving basic information in case of a catastrophe. Actually, someone thought about the content on the french wikipedia, but for now, not much work have been given on it.
Longevity, just like any human project, can only be reached by building on previous experience. Before you can send a rocket to the moon, you will have to build the first rocket, then send the first rocket into the stratosphere, then send a rocket into orbit.
For preserving digital information, there is a lot of experience out there, especially in maintaining software source code over long time. These projects are so many that you can talk already of a "natural selection" among survival strategies. The most commonly cited are "Keep it Simple, Stupid" (KISS) and "Lots of Copies Keep Stuff Safe" (LOCKSS). Proprietary source code that exists in only one copy can survive only if the company survives, which requires profitable and slowly emerging areas of business. Open source code can survive even if the company goes bankrupt. But survival also requires relevancy. Irrelevant software is not maintained, and easily forgotten during emergencies or relocation. If it goes away, nobody cares to ask for it. Dependence on complicated and immature tools (programming languages, version control systems, filesystems, database formats) can cause the loss of source code.
Good examples of long-lived software is the GNU Emacs text editor (18 years?) and the Linux operating system kernel (12 years?). Both are still actively used, developed, and maintained.
The idea to engrave Wikipedia contents on physical media fails to meet many of these criteria. For example, nobody uses such media in everyday life and the knowledge of how to use them is not widespread. If a mistake is made in the engraving, such that it is impossible to read the contents back, very few people are able to detect this mistake. There is no previous experience in rescueing information from such engravings.
Paper print-out is a little better than engraving, because many people have the knowledge of how to read from paper. There is also plenty of experience (several centuries) from long-term preservation of (acid-free) paper. However, the error rates from scanning and OCR are such that restoration might be difficult or impractical. Paper might be best suited for a printed Wikipedia 1.0 that serves as a printed encyclopedia, since that printout has a use in itself, besides preserving the contents. Paper might be less suitable for preserving the edit history of every article, since nobody would ever read that other than for restoration.
During my 22 years of programming, I have shifted operating systems and programming languages many times, but only once (in 1990) have I shifted character sets (from ASCII to ISO 8859-1). Soon I will shift again to Unicode/UTF-8, which I hope to use for the rest of my life. If I had saved them and copied them to my next computer, I could still read the plain text files I wrote in 1982. Actually, I still keep some print-outs from 1984 and I should retype them before the ink fades away. I still have saved e-mails from 1986 on my current disk.
Wikipedia's current method of distributing digital dumps of the entire database, in combination with keeping the project current and relevant, is the best survival strategy I can think of. Since it has now survived 4 years, we can hope that it will survive 10 years. And when that is reached, we can hope that it will survive for 10 more.
I have a website (runeberg.org) that dates back to 1992. Actually it was on Gopher first, but Gopher sites are not in much use today and nobody cares to preserve them. My site migrated to WWW in 1993-94. The pages are kept under RCS and it is *great fun* to be able to trace more than 10 years of RCS history online, http://runeberg.org/rc.pl?action=history&src=admin/foreign