[Labs-l] Launching jobs, any limit?

Mon May 26 15:03:52 UTC 2014

I don't even think that the load can be trusted in case of labs :P
there are various external issues with things like NFS which are also
affecting the load of a box (I've seen boxes with load like
5462492386276 in past on labs and they were just fine).

On Mon, May 26, 2014 at 4:29 PM, Tim Landscheidt <tim at tim-landscheidt.de> wrote:
> Emilio J. Rodríguez-Posada <emijrp at gmail.com> wrote:
>
>> These days I'm processing Wikipedia dumps. Today I tried English Wikipedia,
>> which is in 150+ chunks (pages-meta-history*.7z).
>
>> I have a bash script that launches the jsub jobs, one job per chunk, so I
>> queued more than +150 jobs. After that, I saw that 95 jobs of them were
>> started and spread all over the execution nodes.
>
>> I saw the load of some of the nodes to reach 250%, is this normal? I
>> stopped all them because I'm not sure if I have to launch small batches, 10
>> each time or so, or it is OK to launch all them and ignore the CPU load of
>> execution nodes.
>
> The grid should keep the average load below 1, but that is
> its job, not yours :-).  So launching 150 jobs is totally
> fine.  If you see a load of more than 100 % for a prolonged
> time, notifying an admin doesn't hurt, but due to the nature
> of the system -- the grid can only guess what the /future/
> load of a job will be -- outliers are to be expected.
>
> Tim
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l