Re: [Toolserver-l] New hardware ordered

18 Jan 2009

Daniel Kinzler wrote:
...
  A more conventional solution would be to have a two
more copies of the files, on
 the same server, which are synced every, say, 24 hours: backup a -> backup b,
 live mirror -> back a. But this would require three times the space. Considering
  we have 5TB worth of media files currenlty (does this include thumbnails?), and
 the new server will have 24TB of space, this could work for a while. But taking
 into account exponential growth, it wouldn't last long.

 Tripling space requirements seems a bit of overkill. Maybe there's a smarter
 solution. Ideas?

 --daniel 
Seems worth mentioning how I am currently replicating commons files.

First, there's a bot watching file uploads to scan them, so all files
are usually already at the box.
The I run a script to make quasi-snapshots of commons. They aren't real
snapshots, as I use the api and thus not an exact point in time.
Toolserver doesn't have that problem, as it keeps a commons db copy, it
can directly query a snapshot of the image table.

For each image, the scripts look for a copy on previous snapshots as
well as the uploads copy (verifying by the hash). Only a few iamges are
not found an thus need to be downloaded. All others are hardlinked.

As each download is done on a different folder, i get snapshots of
different points of time. Deleted images are simply not hardlinked.
The system has xfs, but the stript doesn't require special abilities on
the filesystem other than typical unix hardlinks, although a filesystem
without a fixed inode block is really encouraged.

You may spent some GB per snapshot in inodes (1GB per 4M files given a
inode size of 256) and some for folder contents, but that's completely
aceptable as size of new files you find per snapshot is one order of
magnitude greater.

Some caveats: oldimage table has 'unexpected' entries. Don't make
assumptions such as "a filename can't be twice" or "there will always
be
a file".

Of course, the code is available. If I can be of help... just ask :)

Yours,
Platonides

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Toolserver-l] New hardware ordered