Hi,
I am planning to make some maintenace reports for the Danish Wikipedia at regular intervals, like once a week or once every few days, but the exact time to run the programs doesn't really matter.
So what time of the day is it best to run such programs?
And is there a way to tell SGE that it may choose the most convenient time to start to job?
Byrial Jensen byrial@vip.cybercity.dk wrote:
I am planning to make some maintenace reports for the Danish Wikipedia at regular intervals, like once a week or once every few days, but the exact time to run the programs doesn't really matter.
So what time of the day is it best to run such programs?
And is there a way to tell SGE that it may choose the most convenient time to start to job?
It will do that by itself, in fact, it is its whole pur- pose :-). Regarding time:
| timl@yarrow:~$ perl -we 'my $t = rand (24 * 60); printf "%d:%02d\n", $t / 60, $t % 60;' | 15:18 | timl@yarrow:~$
Does that suit you? :-)
Tim
Den 10-04-2013 20:06, Tim Landscheidt skrev:
Byrial Jensen byrial@vip.cybercity.dk wrote:
I am planning to make some maintenace reports for the Danish Wikipedia at regular intervals, like once a week or once every few days, but the exact time to run the programs doesn't really matter.
So what time of the day is it best to run such programs?
And is there a way to tell SGE that it may choose the most convenient time to start to job?
It will do that by itself, in fact, it is its whole pur- pose :-).
Well, no. SGE cannot know if I want the job run as early as possible or if I can happily wait for several hours for a less busy part of the day, unless there is some option I can use to tell that.
On 04/10/2013 02:41 PM, Byrial Jensen wrote:
unless there is some option I can use to tell that.
What Tim mean is that, by default, SGE will schedule your job when sufficient resources are effectively available, rather that trying to predict when that will happen.
That said, you /can/ specify both a minimal starting time (with -a) and a deadline (with -dl) creating a "window" during which SGE will try to run your job but, in general, it's easier and more reliable to let the gridengine pick the time.
If your objective is to have your job run only when few others are trying to use the resources, you can also lower its priority (with -p) so that it will only execute your job when there isn't anything "better" to run.
-- Marc
Am 10.04.2013 20:54, schrieb Marc A. Pelletier:
On 04/10/2013 02:41 PM, Byrial Jensen wrote:
unless there is some option I can use to tell that.
What Tim mean is that, by default, SGE will schedule your job when sufficient resources are effectively available, rather that trying to predict when that will happen.
That said, you /can/ specify both a minimal starting time (with -a) and a deadline (with -dl) creating a "window" during which SGE will try to run your job but, in general, it's easier and more reliable to let the gridengine pick the time.
If your objective is to have your job run only when few others are trying to use the resources, you can also lower its priority (with -p) so that it will only execute your job when there isn't anything "better" to run.
-- Marc
If you are using sge you have not really care about. If you can use the hole cluster (linux and solaris) we mostly have enough capacity. It is only important that you can specify which resources (memory, runtime) you need.
If you need user database access on s3 you simple add -l sql-s3-user=1. If you rise the number of db-resources replag must be lower to get your job scheduled (e.g. -l sql-s3-user=3 currently gets only scheduled if replag is below 1 hour).
deadline option is not available on toolserver. -p mainly changes to priority compared to other jobs of yourself. For the global scheduling order job waiting time and used server resources by your user account in the last hours is more important.
Webserver requests which are also causing much database queries are high at 14-23 UTC workdays. Most sge jobs are submittet between 0-3 UTC.
Merlissimo
Den 10-04-2013 21:35, Merlissimo skrev:
If you are using sge you have not really care about. If you can use the hole cluster (linux and solaris) we mostly have enough capacity. It is only important that you can specify which resources (memory, runtime) you need.
If you need user database access on s3 you simple add -l sql-s3-user=1. If you rise the number of db-resources replag must be lower to get your job scheduled (e.g. -l sql-s3-user=3 currently gets only scheduled if replag is below 1 hour).
deadline option is not available on toolserver. -p mainly changes to priority compared to other jobs of yourself. For the global scheduling order job waiting time and used server resources by your user account in the last hours is more important.
Webserver requests which are also causing much database queries are high at 14-23 UTC workdays. Most sge jobs are submittet between 0-3 UTC.
Thank you for the explanations to all. They could be used to improve the documentaion for SGE.
BTW I can use the whole cluster, as I made this little script to start my compiled C programs:
#!/bin/sh ARCH=`uname` PROG=$1 shift /home/byrial/bin/$ARCH/$PROG $@
(anonymous) wrote:
[...]
If you are using sge you have not really care about. If you can use the hole cluster (linux and solaris) we mostly have enough capacity. It is only important that you can specify which resources (memory, runtime) you need.
[...]
BTW, is the "Queue State" on http://munin.toolserver.org/Miscellaneous/turnera/index.html#sge the number of queued jobs? Mark, is there something Munin- like on Labs? Does Icinga have graphs?
Tim
On 04/10/2013 07:51 PM, Tim Landscheidt wrote:
Does Icinga have graphs?
I wasn't sure you were talking to me. :-)
Icinga have graphs, but they're uptime-related. The place with the pretty graphics is ganglia:
http://ganglia.wmflabs.org/latest/?c=tools
But there is no queued/running job graph right now. It's a very good idea though, and I'll add the matching instrumentation.
-- Marc
toolserver-l@lists.wikimedia.org