Re: [Wikitech-l] [Commons-l] Massive image loss

5 Sep 2008

On Fri, Sep 5, 2008 at 8:54 AM, Tim Starling &lt;tstarling(a)wikimedia.org&gt; wrote:
...
  If it helps, this file has the hashes already:
 http://noc.wikimedia.org/~tstarling/pass-3-targets-hashes 
Thanks. Saved me a step… and fortunately I already had base conversion
code handy.

Sadly, it takes a long time to SHA1 many tbytes of data. I started the
process this morning, but I had made an error in assuming the xargs
parallel argument (-P) wouldn't result in badly interleaved output,
since it didn't in a limited test.  Turns out it did so I had to start
the hashing over again.

(Might I suggest, beyond not invoking unlink() that if your filesystem
can handle some additional inode pressure that you make daily or
weekly hardlink snapshots in a directory tree inaccessible to the web
front end?   It's not as good as a real backup system, but it's cheap
and easy.  On my system (xfs) I have a dozen or so hardlink snapshots
of the Wikimedia image collection: while I was getting updates I was
creating snapshots which roughly coincided with the released database
dumps)

Since the hashing is going to take a while I'll hop on IRC and pass
you a link to a tar with the file name matches. Turns out that I have
*most* of them based on name match alone. (dunno why my earlier count
was wrong… perhaps a unicode handling bug on my part, I'd just woken
up when I sent my prior email)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] [Commons-l] Massive image loss