Albert seems to be connected to the internal network on a 100Mbps link; the backup files are stored on yongle, and accessed over NFS through that link.
Fast, disk-intensive things like making the image tarballs or md5sum'ing the database dumps saturate the line; the apaches have to fight for traffic on that same line trying to load images from albert's NFS server when rendering pages... which leads to waits, hangs, and general suckiness.
If we run the backup process on yongle (or move the actual dump files used?) maybe it'll work better... and/or if that 100Mbps link is supposed to be gigabit, we need to figure out why it's at the wrong speed.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Albert seems to be connected to the internal network on a 100Mbps link; the backup files are stored on yongle, and accessed over NFS through that link.
I'm full of crap; there are a bunch of misleading symlinks around. :)
Unfortunately, writes to the disk array on albert really do seem that fricking slow. We're probably going to have to move the uploads to another machine, or something. Sigh.
It seems great with smaller files, but as you get bigger I guess it can't cache it anymore and writes turn synchronous, turning performance to crap.
A series of tests with 'time dd if=/dev/zero of=bigfile bs=1M count=<size>':
size speed (MB) (MB/sec) 50: 151 200: 310 400: 152 800: 37 1024: 8.4
-- brion vibber (brion @ pobox.com)
I'm full of crap; there are a bunch of misleading symlinks around. :)
Unfortunately, writes to the disk array on albert really do seem that fricking slow. We're probably going to have to move the uploads to another machine, or something. Sigh.
It seems great with smaller files, but as you get bigger I guess it can't cache it anymore and writes turn synchronous, turning performance to crap.
A series of tests with 'time dd if=/dev/zero of=bigfile bs=1M count=<size>':
size speed (MB) (MB/sec) 50: 151 200: 310 400: 152 800: 37 1024: 8.4
-- brion vibber (brion @ pobox.com)
Instead of move upload, why not move only backup files, on a dedicated storage server for example :) (where we could store logs too)
Shaihulud
On Friday, Chad and I will be in the colo moving a lot of stuff to the new switch. This will free up ports-o-plenty and so everything from Wikimedia can go on the gigabit switches for sure.
I'm going to give that 100Mbs switch back to Bomis (it's theirs anyway) and we're going to hang them off a single port of the new switch. This will facilitate measuring their traffic and charging them accordingly.
(We're changing the contract with the colo so that Wikimedia pays for hosting directly, and charges Bomis for whatever they use. This will mean that for the first time, Wikimedia will be paying fully for our bandwidth and hosting, instead of Bomis picking up more than it's fair share of the tab.)
--Jimbo
Brion Vibber wrote:
Albert seems to be connected to the internal network on a 100Mbps link; the backup files are stored on yongle, and accessed over NFS through that link.
Fast, disk-intensive things like making the image tarballs or md5sum'ing the database dumps saturate the line; the apaches have to fight for traffic on that same line trying to load images from albert's NFS server when rendering pages... which leads to waits, hangs, and general suckiness.
If we run the backup process on yongle (or move the actual dump files used?) maybe it'll work better... and/or if that 100Mbps link is supposed to be gigabit, we need to figure out why it's at the wrong speed.
-- brion vibber (brion @ pobox.com)
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org