On Sat, Jan 28, 2012 at 10:56:27AM +0200, Ariel T. Glenn wrote:
Sure, but that assumes you are not already using those
other cores for
something else. In our case, we are ;-)
So do we (the Perl scripts use e.g.
http://search.cpan.org/~dlux/Parallel-ForkManager/, so we have several
bunzip2 processes). Despite SSDs, RAID array etc. the machines still
have an abundance of CPU power, so naturally I would like to throw CPU
time at some problems instead of seeing 2-4% CPU and diskwaits. :-)
pbunzip2 allows me to do that under certain circumstnaces
(uncompressing pbzip2 packed archive). I can read in the whole archive
to memory, perform a parallel decompression and then shove it to the
disks en bulk.
I have no problem in continuing with the status quo (several processes
on SMP), but I still see a CPU load of just 50% on average. Despite
hyperthreading, of the 16 cores (8 phys + 8 virt), only ~4 physical
CPUs are under load. I guess the machine will get some VMs to host
then. :-)
Situation changes if we repack the archives with pbzip2, then we get
during unpack a load of 6 to 6,5 phys CPUs per machine and 4x
speedup. But repacking is void in everyday use, as we inspect (unpack)
every wiki archive just once. We did the repacking just for evaluation
purposes.
Ok, keep the status as is, but my plea is to think in future about the
growing number of CPU cores in the machines of the users of your
dumps. ;-)
regards,
--
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH -
www.petamem.com Geschäftsführer: Richard Jelinek
Human Language Technology Experts Sitz der Gesellschaft: Fürth
69216618 Mind Units Registergericht: AG Fürth, HRB-9201