Timwi wrote:
- The other thing I like to mention is LiveJournal. Their database backend is pretty impressive and handles the load of almost a million active users. They have never even dreamt of placing journal entries, audio posts or user pictures into files in a file system. The way they have it now they can easily create more database clusters and move users (and their data) around between clusters using a little Perl script. With a file system, that would be quite a bit more difficult.<<
Not quite. LiveJournal identified blob transfer from their databases as a bottleneck and removed them to remove the bottleneck, as part of their work on removing the databases as a whole from bottleneck status, which in general meant shifting database things into memcached caches whenever possible. This moving started in earnest in early November.
LJ blobs (images and audio) are now on a NetApp box filesystem, completely apart from their user database cluster machines. You can see the overall architecture here:
http://www.livejournal.com/community/lj_backend/974.html
A description of the move out of the database machines is here:
http://www.livejournal.com/community/lj_backend/502.html
Without the third party Akamai servers they would be serving via their blob component, which would cache in memcached, loading into that from the NetApp filesystem if not in the cache already.
Addressing your concerns about integrity, the way to do that is to back up the images, then the database, then any new images. With all image names incorporating timestamps that will ensure that the images to match the database are available, at the cost of keeping some extra images - those deleted before the backup and new created during it.
LJ caches data into memcached with the memcached servers (sometimes several of them on one machine) residing on the web servers (page builders), because the web servers are CPU-bound and have RAM to spare. Lets the machines do double duty.
Like Memcached, the blob component is open source.