Timwi wrote:
- The other thing I like to mention is LiveJournal. Their database backend is pretty impressive and handles the load of almost a million active users. They have never even dreamt of placing journal entries, audio posts or user pictures into files in a file system. The way they have it now they can easily create more database clusters and move users (and their data) around between clusters using a little Perl script. With a file system, that would be quite a bit more difficult.<<
Not quite. LiveJournal identified blob transfer from their databases as a bottleneck and removed them to remove the bottleneck, as part of their work on removing the databases as a whole from bottleneck status, which in general meant shifting database things into memcached caches whenever possible. This moving started in earnest in early November.
LJ blobs (images and audio) are now on a NetApp box filesystem, completely apart from their user database cluster machines. You can see the overall architecture here:
http://www.livejournal.com/community/lj_backend/974.html
A description of the move out of the database machines is here:
http://www.livejournal.com/community/lj_backend/502.html
Without the third party Akamai servers they would be serving via their blob component, which would cache in memcached, loading into that from the NetApp filesystem if not in the cache already.
Addressing your concerns about integrity, the way to do that is to back up the images, then the database, then any new images. With all image names incorporating timestamps that will ensure that the images to match the database are available, at the cost of keeping some extra images - those deleted before the backup and new created during it.
LJ caches data into memcached with the memcached servers (sometimes several of them on one machine) residing on the web servers (page builders), because the web servers are CPU-bound and have RAM to spare. Lets the machines do double duty.
Like Memcached, the blob component is open source.
user_Jamesday wrote:
status, which in general meant shifting database things into memcached caches whenever possible. This moving started in earnest in early November.
Very interesting.
Perhaps I should point out that my reported experience with blob I/O bottleneck problems have been in systems where the database contents were buffered in RAM, and not related to disk I/O. In theory, such a database should be fast enough to do anything that memcached promises to do. In practice, poorly designed software can easily slow down any hardware.
user_Jamesday wrote:
LJ blobs (images and audio) are now on a NetApp box filesystem, completely apart from their user database cluster machines.
I didn't realise that this was a filesystem. It was constantly referred to as a "NetApp" and I had no idea what it was, so I assumed it was some sort of database or cache.
I also thought that functionally it was really just a cache, and I thought the actual canonical data was still in the LJ database. Thanks for clearing up that they are actually moving stuff out of the database.
But then again, the way LiveJournal handles userpics (or blobs in general) makes backing up a consistent state easier for two reasons: (1) The same blob ID is never used for different blobs. (2) The same blob, for as long as it exists, is always found under the same ID.
Thus, all they need to do to back everything up is to back up the database, and then the NetApp. Then you might have blobs in the NetApp that aren't referenced in the database, but that's not nearly as bad as having blobs referenced in the database that do not exist in the NetApp.
So, then, I guess Wikipedia could do that too.
Timwi
Timwi wrote:
But then again, the way LiveJournal handles userpics (or blobs in general) makes backing up a consistent state easier for two reasons: (1) The same blob ID is never used for different blobs. (2) The same blob, for as long as it exists, is always found under the same ID.
Yes, this is the way I would have designed it too. However, NetApps are quite expensive NFS file servers, and they are soooo 1999. :-)
wikitech-l@lists.wikimedia.org