I'm still wondering how the images are stored currently- on the hd of a single Server? They don't seem to be in the database. Once we use a cluster of Apaches the files need to be available to all servers. The easiest solution might be to mount the media dir from a fileserver (with a big Raid array) via nfs.
I think the sessions are stored in the database, so this is no problem.
Gabriel Wicke
On Thu, Jan 08, 2004 at 04:13:46PM +0100, Gabriel Wicke wrote:
I'm still wondering how the images are stored currently- on the hd of a single Server? They don't seem to be in the database.
Single server, not in the database currently.
Once we use a cluster of Apaches the files need to be available to all servers. The easiest solution might be to mount the media dir from a fileserver (with a big Raid array) via nfs.
And to replicate it to a secondary machine using rsync, for backup purpose. Having heartbeat on the squids, could those build an NFS cluster?
Regards,
JeLuF
On Thu, 08 Jan 2004 16:28:41 +0100, Jens Frank wrote:
On Thu, Jan 08, 2004 at 04:13:46PM +0100, Gabriel Wicke wrote:
I'm still wondering how the images are stored currently- on the hd of a single Server? They don't seem to be in the database.
Single server, not in the database currently.
Once we use a cluster of Apaches the files need to be available to all servers. The easiest solution might be to mount the media dir from a fileserver (with a big Raid array) via nfs.
And to replicate it to a secondary machine using rsync, for backup purpose. Having heartbeat on the squids, could those build an NFS cluster?
In theory-yes. The disadvantage is security- the Squids are the firewall, separate fileservers could be configured to only accept connections from the apaches. Having both the database and the fileservers behind the firewall and behind the Apaches should be much safer.
There are network filesystems like afs that could propably do a better job for immediate replication than nfs, don't know. We should figure this out soon...
Whatever the configuration is- this might be a good application for the current servers. The main server should have raid, but the backup one could most propably do without. These servers won't see a lot of hits- basically one write and one read for each image/file for whatever the squid config's timeout value is. But they should be connected with Heartbeat to provide failover. I'm not shure if there's a good test module for fileservers that can detect the more subtle failures.
Gabriel
About AFS: http://www.openafs.org/ http://www.angelfire.com/hi/plutonic/afs-faq.html#sub1
Sounds interesting, especially the failover/caching part. I have no experience with this however.
Gabriel Wicke
On Thu, 08 Jan 2004 17:03:43 +0100, Gabriel Wicke wrote:
About AFS: http://www.openafs.org/ http://www.angelfire.com/hi/plutonic/afs-faq.html#sub1
From the openafs website:
Efficiency Boosters: Replication and Caching
AFS incorporates special features on server machines and client machines that help make it efficient and reliable.
On server machines, AFS enables administrators to replicate commonly-used volumes, such as those containing binaries for popular programs. Replication means putting an identical read-only copy (sometimes called a clone) of a volume on more than one file server machine. The failure of one file server machine housing the volume does not interrupt users' work, because the volume's contents are still available from other machines. Replication also means that one machine does not become overburdened with requests for files from a popular volume.
Gabriel Wicke wrote:
I'm still wondering how the images are stored currently- on the hd of a single Server? They don't seem to be in the database. Once we use a cluster of Apaches the files need to be available to all servers. The easiest solution might be to mount the media dir from a fileserver (with a big Raid array) via nfs.
Does nfs still stink? I know that it used to be common wisdom that you never webserve stuff from nfs, because the performance was disastrous. Probably some smart apache rewriting could solve this for the most part, i.e. if a server doesn't have a requested image locally, it tries to get it via nfs and *keeps* it locally for future reference.
Probably Brion has more good ideas about how to handle images on our proposed architecture.
--Jimbo
On Thu, Jan 08, 2004 at 02:10:32PM -0800, Jimmy Wales wrote:
Gabriel Wicke wrote:
I'm still wondering how the images are stored currently- on the hd of a single Server? They don't seem to be in the database. Once we use a cluster of Apaches the files need to be available to all servers. The easiest solution might be to mount the media dir from a fileserver (with a big Raid array) via nfs.
Does nfs still stink? I know that it used to be common wisdom that you never webserve stuff from nfs, because the performance was disastrous. Probably some smart apache rewriting could solve this for the most part, i.e. if a server doesn't have a requested image locally, it tries to get it via nfs and *keeps* it locally for future reference.
There's a program we could run in front of our webservers, I think it's called squid, that can solve this ;-) Just enable caching for those. Sending a PURGE if an image is overwritten might be OK.
JeLuF
On Jan 8, 2004, at 14:10, Jimmy Wales wrote:
Gabriel Wicke wrote:
I'm still wondering how the images are stored currently- on the hd of a single Server? They don't seem to be in the database.
There is a table in the database that _lists_ images, but the uploaded files are simply stored in the web server's local filesystem. For the primitive en/en2 split, we just force all image-related actions to happen on one server, but that's not very satisfactory.
Probably Brion has more good ideas about how to handle images on our proposed architecture.
Well, the idea I floated a while ago, which was somewhat unpopular, was to basically keep the "real" images as blobs in the database and let the web servers cache them to the local filesystem as needed.
I should point out that the database holds far more data in the revision history than we have in uploaded images. en.wikipedia.org's uploads directory totals about 1.2GB, de's is <300MB. If we're talking about saving 14GB from the database by compressing old revisions, 2 gigs or so of images seem a relatively minor burden.
-- brion vibber (brion @ pobox.com)
On Fri, Jan 09, 2004 at 12:25:05AM -0800, Brion Vibber wrote:
On Jan 8, 2004, at 14:10, Jimmy Wales wrote:
Gabriel Wicke wrote:
Well, the idea I floated a while ago, which was somewhat unpopular, was to basically keep the "real" images as blobs in the database and let the web servers cache them to the local filesystem as needed.
I should point out that the database holds far more data in the revision history than we have in uploaded images. en.wikipedia.org's uploads directory totals about 1.2GB, de's is <300MB. If we're talking about saving 14GB from the database by compressing old revisions, 2 gigs or so of images seem a relatively minor burden.
Unpopular it was due to the already high load of the apaches and databases. If the images are cached by the squids, it shouldn't be a big difference whether they are stored in the DB or in the filesystem from a performance point of view.
Putting them into the DB would save a lot of work: no need to have redundant file servers.
Regards,
JeLuF
On Fri, 09 Jan 2004 13:54:28 +0100, Jens Frank wrote:
Putting them into the DB would save a lot of work: no need to have redundant file servers.
We can use the DB servers for now and move the files to separate servers if necessary. I'm against placing them in the DB, this would create scaling problems in the future. And it wouldn't be faster than nfs. Also image scaling is easier when the files are already on the filesystem. Setting up nfs is easy, i don't know about afs.
wikitech-l@lists.wikimedia.org