[Labs-l] Launching jobs, any limit?

Mr. Maximilian Doerr cybernet678 at yahoo.com
Mon May 26 18:51:01 UTC 2014


Yea.  I meant execute.  Not submitted.  Sorry for the confusion. 

On May 26, 2014, at 2:45 PM, Tim Landscheidt <tim at tim-landscheidt.de> wrote:

> Maximilian Doerr <cybernet678 at yahoo.com> wrote in a slightly
> different order:
> 
>>>> These days I'm processing Wikipedia dumps. Today I tried English Wikipedia,
>>>> which is in 150+ chunks (pages-meta-history*.7z).
> 
>>>> I have a bash script that launches the jsub jobs, one job per chunk, so I
>>>> queued more than +150 jobs. After that, I saw that 95 jobs of them were
>>>> started and spread all over the execution nodes.
> 
>>>> I saw the load of some of the nodes to reach 250%, is this normal? I
>>>> stopped all them because I'm not sure if I have to launch small batches, 10
>>>> each time or so, or it is OK to launch all them and ignore the CPU load of
>>>> execution nodes.
> 
>>> The grid should keep the average load below 1, but that is
>>> its job, not yours :-).  So launching 150 jobs is totally
>>> fine.  If you see a load of more than 100 % for a prolonged
>>> time, notifying an admin doesn't hurt, but due to the nature
>>> of the system -- the grid can only guess what the /future/
>>> load of a job will be -- outliers are to be expected.
> 
>> Wait.  The grid should have a limit of 15.  I've hit that limit so many times, I received my own exec node.
> 
> No, the grid should have no limit for the number of jobs
> submitted, but limit the number of jobs executed in parallel
> per user.  Apparently, the latter got lost during the migra-
> tion from pmtpa to eqiad.  I've filed
> https://bugzilla.wikimedia.org/65777 for that.
> 
> Tim
> 
> 
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l




More information about the Labs-l mailing list