Hi, i added this warning today (according to prior agreement with DaB.) when a job is submitted without arch resource. This has two reasons:
# First next week the default setting will change from "solaris only" to "all servers". This was announced in July (http://lists.wikimedia.org/pipermail/toolserver-l/2012-July/005110.html)
# Secondly due to some server problems of the last two days many jobs need a longer runtime which lead to higher load on willow. Last night some jobs waited up to hours until willow was available again although other servers had unused cpu and memory at the same time.
In most cases you can simply add " -l arch='*' " as argument to qcronsub/qsub without any problems. Most scripts should run on solaris and linux, but perhaps you should test it before to be sure. If your job is currently only executable on solaris you must add "-l arch=sol" before the default setting will change next week. For more information check https://wiki.toolserver.org/view/Job_scheduling.
I also noticed that on user-store outage on sunday only one job was waiting some hours because of the missing resource "fs-user-store", but many people complained about their failed jobs. When your job needs a special resource check if that is requestable on https://wiki.toolserver.org/view/Job_scheduling#Optional_resources. SGE will execute your job only when the requested resource is available. If you job is already running and a needed resource is gone you can also exit you script with code "99". This requeues your job when the resource is available again.
@Krinkle You got the message while i was hacking the live jsv script, I simply copied the runtime warning message and then changed it. This was so easy that i save myself to disable jsv while rewriting.
Currently in total there is enough cpu and memory free for all user scripts. SGE jobs are executed on five different servers and more server could be added easily. The main problem is the load distribution because many users do not use SGE which is bad on a shared system and leads to overload on few servers. So please use cronie on host submit and qsub/qcronsub to submit jobs to sge instead of running them on a special server directly. Toolserver hardware is getting older and server may go away suddenly because of problems. With sge you do not have to care about it.
Merlissimo
P.S.: I want to thank DaB. for his engagement to get more money for hardware on toolserver cluster next year. I also think this is really needed especially for the database servers. You can follow the discussion on http://meta.wikimedia.org/wiki/Talk:Wikimedia_Deutschland/2013_annual_plan_draft/de#Toolserver.
Am 24.09.2012 18:31, schrieb Krinkle:
On Sep 24, 2012, at 6:20 PM, Platonides Platonides@gmail.com wrote:
On 24/09/12 18:07, Krinkle wrote:
Can someone decode this? What is this?
-- Krinkle
Begin forwarded message:
*From: *root@toolserver.org mailto:root@toolserver.org (Cron Daemon) *Subject: **Cron krinkle@hawthorn qcronsub -N dbbot_wm -m n -j y -b y -l h_rt=INFINITY -l virtual_free=90M "$HOME/bots/dbbot-wm-start.sh"* *Date: *September 24, 2012 6:05:07 PM GMT+02:00 *To: *krinkle@toolserver.org mailto:krinkle@toolserver.org
warning: Please add maximum runtime by adding parameter [33m-l arch=[0msol|lx
The text asks you to place a time limit. The parameter (embedded in posix colors despite not being output to a terminal) to specify if it needs a linux or solaris server.
However, if I try to execute it, I get a much saner message: $ qcronsub -N dbbot_wm -m n -j y -b y -l h_rt=INFINITY -l virtual_free=90M "/home/krinkle/bots/dbbot-wm-start.sh"
Unable to run job: Script not executable: /home/krinkle/bots/dbbot-wm-start.sh. Exiting. warning: Please add the os this job can run on by adding parameter -l arch='*'|sol|lx For more information read documentation at https://wiki.toolserver.org/view/Job_scheduling
As this is a php script, your parameter would be «-l arch='*'»
Yes, I've added `-l arch='*'` to it already a minute ago.
Warnings are gone, not sure why it nagged about maximum runtime, it already has INFINITY.
I'm not sure why arch=x isn't the default though, or maybe it is but outputs the warning anyway? A warning like that may be useful, but do consider that cronie from submit will send e-mails for it.
-- Krinkle _______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette