Re: [Foundation-l] Request: WMF commitment as a long term cultural archive?

3 Jun 2011


      On 02/06/11 19:52, George Herbert wrote:
...
On Thu, Jun 2, 2011 at 10:55 AM, David Gerarddgerard@gmail.com  wrote:
...
On 2 June 2011 18:48, Faefaenwp@gmail.com  wrote:
...
In 2016 San Francisco has a major earthquake and the servers and
operational facilities for the WMF are damaged beyond repair. The
emergency hot switchover to Hong Kong is delayed due to an ongoing DoS
attack from Eastern European countries. The switchover eventually
appears successful and data is synchronized with Hong Kong for the
next 3 weeks. At the end of 3 weeks, with a massive raft of escalating
complaints about images disappearing, it is realized that this is a
result of local data caches expiring. The DoS attack covered the
tracks of a passive data worm that only activates during back-up
cycles and the loss is irrecoverable due backups aged over 2 weeks
being automatically deleted. Due to no archive strategy it is
estimated that the majority of digital assets have been permanently
lost and estimates for 60% partial reconstruction from remaining cache
snapshots and independent global archive sites run to over 2 years of
work.
This sort of scenario is why some of us have a thing about the backups :-)
(Is there a good image backup of Commons and of the larger wikis, and

and this one may be trickier - has anyone ever downloaded said

backups?)

d.

I've floated this to Erik a couple of times, but if the Foundation
would like an IT disaster response / business continuity audit, I can
do those.
Tape is -- still -- your friend here. Flip the write-protect after 
writing, have two sets of off-site tapes, one copy of each in each of 
two secure and widely separated off-site locations run by two different 
organizations, and you're sorted.
Tape is the dumb backstop that will keep the data even when your 
supposedly infallible replicated and redundant systems fail. For 
example, it got Google out of a hole quite recently when they had to 
restore a significant number of Gmail accounts from tape. (see 
http://www.talkincloud.com/the-solution-to-the-gmail-glitch-tape-backup/ )
And, unlike other long-term storage media, there is a long history of 
tape storage, an understanding of its practical lifespan and risks, and 
well-understood procedures for making and verifying duplicate sub-master 
copies to new tape technologies over time to extend archive life, etc. etc.
If we say that Wikimedia Commons currently has ~10M images, and if allow 
1Mbyte per image, that's only 10 TB: that will fit nicely on seven LTO5 
tapes.   If you use LTFS, you can also make data access and long-term 
data robustness easier. If you like, you can slip in a complete dump of 
the Mediawiki source and Commons database on each tape, as well.
Even if I'm wrong by an order of magnitude, and 140 tapes are needed, 
instead of 14, that's still less than $10k of media -- and I wouldn't be 
surprised if tape storage companies wouldn't be eager to vie to be the 
company that can claim it donates the media and drives which provide 
Wikipedia's long-term backup system.
With two tape drives being run at once at an optimal 140 MB/s each, the 
whole backup would take less than a day. Even if I was wrong about both 
the writing speed and archive size by an order of magnitude each, this 
would still be less than three months.
The same tape systems could also, trivally, be used to back up all the 
other WMF sites, on similar lines.
-- Neil

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Foundation-l] Request: WMF commitment as a long term cultural archive?