2009/7/17 Aryeh Gregor <Simetrical+wikilist(a)gmail.com>om>:
On Thu, Jul 16, 2009 at 10:18 PM, David
Gerard<dgerard(a)gmail.com> wrote:
> Honestly, a reboot will be quicker.
Why would a reboot help? The system would still have
too little disk
space (if that's what's actually causing the problem).
When you beat the crap out of Solaris, it needs rebooting way more
than a Unix should. We don't tell the NT admins about this, they get
snarky.
The ZFS bug manifests when the file system is (a) very full (b)
getting lots of writes. The block allocation algorithm uses up all the
CPU trying for perfection rather than adequacy. So system CPU goes
through the roof and the system turns to molasses. Only way out:
reboot - stopping writes or severely reducing the disk usage didn't
work for us on Solaris 10.
After a reboot, don't write to the file system, just read the data off it.
Then start over with a lot less data on that FS. 70% or less.
Hard part: being able to take the machine out of service at all.
Harder part: moving services off the box while keeping disk under 70%.
- d.