Timwi wrote:
> * The other thing I like to mention is
LiveJournal. Their database backend is pretty impressive and handles the load of almost a
million active users. They have never even dreamt of placing journal entries, audio posts
or user pictures into files in a file system. The way they have it now they can easily
create more database clusters and move users (and their data) around between clusters
using a little Perl script. With a file system, that would be quite a bit more
difficult.<<
Not quite. LiveJournal identified blob transfer from their databases as a bottleneck and
removed them to remove the bottleneck, as part of their work on removing the databases as
a whole from bottleneck status, which in general meant shifting database things into
memcached caches whenever possible. This moving started in earnest in early November.
LJ blobs (images and audio) are now on a NetApp box filesystem, completely apart from
their user database cluster machines. You can see the overall architecture here:
http://www.livejournal.com/community/lj_backend/974.html
A description of the move out of the database machines is here:
http://www.livejournal.com/community/lj_backend/502.html
Without the third party Akamai servers they would be serving via their blob component,
which would cache in memcached, loading into that from the NetApp filesystem if not in the
cache already.
Addressing your concerns about integrity, the way to do that is to back up the images,
then the database, then any new images. With all image names incorporating timestamps that
will ensure that the images to match the database are available, at the cost of keeping
some extra images - those deleted before the backup and new created during it.
LJ caches data into memcached with the memcached servers (sometimes several of them on one
machine) residing on the web servers (page builders), because the web servers are
CPU-bound and have RAM to spare. Lets the machines do double duty.
Like Memcached, the blob component is open source.