2009/7/13 Domas Mituzas midom.lists@gmail.com:
Precis: if the file system is very busy (being hammered) *and* it's over 85% full, the block allocator can get stuck trying to work out the *very best* allocation rather than one that'll do and let it get on with other work. To the point where you see CPU go through the roof, with 80% system CPU and a very unresponsive system. You can't stop this without rebooting the box.
This is exactly what we're seeing, except that we could get out of it by dropping older snapshots.
Yeah - cutting down how full the file system is.
Sun acknowledged it as a bug and it'll be fixed in a future release; they gave us a hotpatch. The workaround? Keep the ZFS filesystem in question under 70% full ...
:-) hehehehehe, 'the heck beaten out of it' sounds like what we tend to do to our systems at wikimedia ;-)
It's useful testing, and you can be sure Sun will be interested in your results in detail, we're a reasonably famous site! A coworker spoke to the Sun kernel engineer tearing his hair out over this one ...
I fear the answer re: ZFS is to some extent "don't do that then" until it's fixed. Of course, you want snapshots. It's a tricky one.
- d.