On Fri, Sep 5, 2008 at 8:54 AM, Tim Starling <tstarling(a)wikimedia.org> wrote:
Thanks. Saved me a step… and fortunately I already had base conversion
code handy.
Sadly, it takes a long time to SHA1 many tbytes of data. I started the
process this morning, but I had made an error in assuming the xargs
parallel argument (-P) wouldn't result in badly interleaved output,
since it didn't in a limited test. Turns out it did so I had to start
the hashing over again.
(Might I suggest, beyond not invoking unlink() that if your filesystem
can handle some additional inode pressure that you make daily or
weekly hardlink snapshots in a directory tree inaccessible to the web
front end? It's not as good as a real backup system, but it's cheap
and easy. On my system (xfs) I have a dozen or so hardlink snapshots
of the Wikimedia image collection: while I was getting updates I was
creating snapshots which roughly coincided with the released database
dumps)
Since the hashing is going to take a while I'll hop on IRC and pass
you a link to a tar with the file name matches. Turns out that I have
*most* of them based on name match alone. (dunno why my earlier count
was wrong… perhaps a unicode handling bug on my part, I'd just woken
up when I sent my prior email)