2009/7/17 Aryeh Gregor Simetrical+wikilist@gmail.com:
On Thu, Jul 16, 2009 at 10:18 PM, David Gerarddgerard@gmail.com wrote:
Honestly, a reboot will be quicker.
Why would a reboot help? The system would still have too little disk space (if that's what's actually causing the problem).
When you beat the crap out of Solaris, it needs rebooting way more than a Unix should. We don't tell the NT admins about this, they get snarky.
The ZFS bug manifests when the file system is (a) very full (b) getting lots of writes. The block allocation algorithm uses up all the CPU trying for perfection rather than adequacy. So system CPU goes through the roof and the system turns to molasses. Only way out: reboot - stopping writes or severely reducing the disk usage didn't work for us on Solaris 10.
After a reboot, don't write to the file system, just read the data off it.
Then start over with a lot less data on that FS. 70% or less.
Hard part: being able to take the machine out of service at all. Harder part: moving services off the box while keeping disk under 70%.
- d.