-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello all!
I have a bunch of cronie jobs calling qcronsub for several times with very similar settings (just the language of the wiki used changes). In total there are 5 jobs - regurarly (about ever 2nd day) one of those jobs does not get executed and it is always the same one. I do not get any error mail. Any idea?
Thanks a lot and greetings DrTrigon
Hello,
which one is it exactly?
Cheers nosy
On Fri, 15 Jun 2012, Dr. Trigon wrote:
Date: Fri, 15 Jun 2012 10:57:43 From: Dr. Trigon dr.trigon@surfeu.ch Reply-To: toolserver-l@lists.wikimedia.org To: Toolserver-l@lists.wikimedia.org Subject: [Toolserver-l] Anoter SGE question
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello all!
I have a bunch of cronie jobs calling qcronsub for several times with very similar settings (just the language of the wiki used changes). In total there are 5 jobs - regurarly (about ever 2nd day) one of those jobs does not get executed and it is always the same one. I do not get any error mail. Any idea?
Thanks a lot and greetings DrTrigon -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk/a+QYACgkQAXWvBxzBrDD5VQCgox3+fPvOxE1CLry5pdA7AMx8 bDQAnjfAsdLAcykRA5j8lyicyVdk8xYC =UeJJ -----END PGP SIGNATURE-----
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello nosy!
Thanks for your reply!
It is this one:
0 1 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y -b y -N subster_en $HOME/pywikipedia/bot_control.py -subster -cron - -lang:en >/dev/null
Greetings DrTrigon
On 15.06.2012 17:51, Marlen Caemmerer wrote:
Hello,
which one is it exactly?
Cheers nosy
On Fri, 15 Jun 2012, Dr. Trigon wrote:
Date: Fri, 15 Jun 2012 10:57:43 From: Dr. Trigon dr.trigon@surfeu.ch Reply-To: toolserver-l@lists.wikimedia.org To: Toolserver-l@lists.wikimedia.org Subject: [Toolserver-l] Anoter SGE question
Hello all!
I have a bunch of cronie jobs calling qcronsub for several times with very similar settings (just the language of the wiki used changes). In total there are 5 jobs - regurarly (about ever 2nd day) one of those jobs does not get executed and it is always the same one. I do not get any error mail. Any idea?
Thanks a lot and greetings DrTrigon
_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
All jobs named "subster_en" were executed successfully by sge (mostly on wolfsbane). The return code of your python script was 0, but the runtime was only about a minute. You can check it by executing e.g. "qacct -j subster_en -d 10" So you should check your log files ~drtrigon/subster_en.o2160088, ~drtrigon/subster_en.o2156743, .... if something is wrong with your python script.
For your other questions: stderr and stdout are buffered by sge because they are send over network. At the toolserver configuration it is send to localhost by default because all execution servers have the same filesystems mounted. At a standard cluster configuration out/err files are written on submit host instead.
Merlissimo
PS:: qcronsub does not output anything if the job was submitted succesfully and all resources are requested correctly. So no need to send the output to /dev/null.
On 16.06.2012 23:33, Dr. Trigon wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello nosy!
Thanks for your reply!
It is this one:
0 1 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y -b y -N subster_en $HOME/pywikipedia/bot_control.py -subster -cron
- -lang:en>/dev/null
Greetings DrTrigon
On 15.06.2012 17:51, Marlen Caemmerer wrote:
Hello,
which one is it exactly?
Cheers nosy
On Fri, 15 Jun 2012, Dr. Trigon wrote:
Date: Fri, 15 Jun 2012 10:57:43 From: Dr. Trigon dr.trigon@surfeu.ch Reply-To: toolserver-l@lists.wikimedia.org To: Toolserver-l@lists.wikimedia.org Subject: [Toolserver-l] Anoter SGE question
Hello all!
I have a bunch of cronie jobs calling qcronsub for several times with very similar settings (just the language of the wiki used changes). In total there are 5 jobs - regurarly (about ever 2nd day) one of those jobs does not get executed and it is always the same one. I do not get any error mail. Any idea?
Thanks a lot and greetings DrTrigon
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 17.06.2012 00:07, Merlissimo wrote:
All jobs named "subster_en" were executed successfully by sge (mostly on wolfsbane). The return code of your python script was 0, but the runtime was only about a minute. You can check it by executing e.g. "qacct -j subster_en -d 10" So you should check your log files ~drtrigon/subster_en.o2160088, ~drtrigon/subster_en.o2156743, .... if something is wrong with your python script.
Thanks for the hint - I bet I would have forgot to write the 'q' in 'qacct'... ;)
So as you adviced me I entered:
qacct -j subster_en -d 10
and was supprised to find I was not always the same (this) job that did not execute. BUT when looking more closely I found e.g.:
* subster_en was NOT executed on: Fri Jun 15 * mainbot was NOT executed on: Son Jun 17
(I got the wrong impression to be always the same because it was always/mostly 1 script that misses... The ones in between those dates are harder to say, because I manually started some of them...)
And check all logs you mentioned additionally - but did not found something related.
So why are some of my cronie-jobs or qcronsub calls (typically 1 per day) silently dropped?
Thanks and greetings DrTrigon
On Sun, Jun 17, 2012, at 02:29 PM, Dr. Trigon wrote:
So why are some of my cronie-jobs or qcronsub calls (typically 1 per day) silently dropped?
I have been having similar experiences lately, and opened a JIRA bug [1] to report it. So far, after more than a week, there have been no other comments on the bug. If other users are also having problems with cron jobs not running, perhaps you could add your reports to this bug and maybe this information will help the admins to diagnose the problem.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
So I spent some hours to investigate this on my side and summarized everything in:
https://jira.toolserver.org/browse/TS-1402#comment-21415
As it looks to me that this issues is getting worse I am really greateful for every hint in any direction (ok any useful one ;)...!
Thanks a lot and greetings DrTrigon
On 20.06.2012 02:04, Russell Blau wrote:
On Sun, Jun 17, 2012, at 02:29 PM, Dr. Trigon wrote:
So why are some of my cronie-jobs or qcronsub calls (typically 1 per day) silently dropped?
I have been having similar experiences lately, and opened a JIRA bug [1] to report it. So far, after more than a week, there have been no other comments on the bug. If other users are also having problems with cron jobs not running, perhaps you could add your reports to this bug and maybe this information will help the admins to diagnose the problem.
[1] https://jira.toolserver.org/browse/TS-1402
_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
For your other questions: stderr and stdout are buffered by sge because they are send over network. At the toolserver configuration it is send to localhost by default because all execution servers have the same filesystems mounted. At a standard cluster configuration out/err files are written on submit host instead.
So would the hint given by Platonides before (btw.: thanks a lot for this!) be of any help?
On 12.06.2012 22:48, Platonides wrote:
That looks like line buffering in stdio. You can try prepending the python command with: stderr -e0
(despite the fact that stderr should be unbuffered by default...)
I'm unsure if it's being buffered at python or if SGE is doing caching there, thoguh. It _should_ be simply passing the file descriptor but, who knows?
be of any help? Either to get SGE to behave like python? Or vice-versa?
PS:: qcronsub does not output anything if the job was submitted succesfully and all resources are requested correctly. So no need to send the output to /dev/null.
;)) thanks for the hint; I cannot rember but there HAS TO BE a reson why I finally decided to add '> /dev/null' ... might be just 1 line droped by SGE or something like that... will check that sometime!! tks! :)
Greetings and thanks for all the hints! DrTrigon
On 16/06/12 23:33, Dr. Trigon wrote:
I have a bunch of cronie jobs calling qcronsub for several times with very similar settings (just the language of the wiki used changes). In total there are 5 jobs - regurarly (about ever 2nd day) one of those jobs does not get executed and it is always the same one. I do not get any error mail. Any idea?
Is it by chance your last cron entry? Remember that there must be a newline finalising your crontab or the last command won't get executed.
You should get this output: $ crontab -l | tail -c1 | od -c 0000000 \n 0000001
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Is it by chance your last cron entry? Remember that there must be a newline finalising your crontab or the last command won't get executed.
You should get this output: $ crontab -l | tail -c1 | od -c 0000000 \n 0000001
No it is not - but thanks a lot for this hint! I also checked:
drtrigon@clematis:~$ cronie -l | tail -c1 | od -c 0000000 \n 0000001
...looks ok to me. Below is the full output of 'cronie -l' for the sake of completeness:
drtrigon@clematis:~$ cronie -l # DO NOT EDIT THIS FILE - edit the master and reinstall. # (/tmp/crontab.ji5t0i/crontab installed on Wed Dec 23 11:02:53 2009) # (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $) # m h dom mon dow command
#0 2 * * * cronsub -sl mainbot $HOME/pywikipedia/bot_control.py - -default -cron 0 2 * * * qcronsub -l h_rt=12:00:00 -l virtual_free=500M -m as -j y -b y -N mainbot $HOME/pywikipedia/bot_control.py -default -cron >/dev/null #0 0 */14 * * cronsub -s compbot $HOME/pywikipedia/bot_control.py - -compress_history:[] -cron 0 0 */14 * * qcronsub -l h_rt=02:00:00 -l virtual_free=100M -m as -j y - -b y -N compbot $HOME/pywikipedia/bot_control.py -compress_history:[] - -cron >/dev/null ##0 6 * * * cronsub -s substerbot $HOME/pywikipedia/subster_beta.py 2>> $HOME/public_html/DrTrigonBot/subster.html #0 0 * * * cronsub -sl ircbot $HOME/pywikipedia/bot_control.py - -subster_irc -cron 0 0 * * * qcronsub -l h_rt=INFINITY -l virtual_free=200M -m as -j y -b y -N ircbot $HOME/pywikipedia/bot_control.py -subster_irc -cron >/dev/null #30 0 * * * cronsub -s subster_frr $HOME/pywikipedia/bot_control.py - -subster -cron -lang:frr 30 0 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y - -b y -N subster_frr $HOME/pywikipedia/bot_control.py -subster -cron - -lang:frr >/dev/null #0 1 * * * cronsub -s subster_en $HOME/pywikipedia/bot_control.py - -subster -cron -lang:en 0 1 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y -b y -N subster_en $HOME/pywikipedia/bot_control.py -subster -cron - -lang:en >/dev/null 30 1 * * * cronsub -s subster_nl $HOME/pywikipedia/bot_control.py - -subster -cron -lang:nl ##30 1 * * * cronsub -s subster_ar $HOME/pywikipedia/bot_control.py - -subster -cron -lang:ar #0 * * * * cronsub -s subster_ar $HOME/pywikipedia/bot_control.py - -subster -cron -lang:ar 0 * * * * qcronsub -l h_rt=02:00:00 -l virtual_free=200M -m as -j y -b y -N subster_ar $HOME/pywikipedia/bot_control.py -subster -cron - -lang:ar >/dev/null 0 0 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y -b y -N subster_meta $HOME/pywikipedia/bot_control.py -subster -cron - -family:meta -lang: >/dev/null
#0 0 * * * cronsub -s maintenance $HOME/warnuserquota.py 0 0 * * * qcronsub -l h_rt=00:05:00 -l virtual_free=50M -m as -j y -b y -N maintenance $HOME/warnuserquota.py >/dev/null
Greetings DrTrigon
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Today 2 jobs did not execute! Does this mean it is getting worse? Is the toolserver about to die?
Greetings DrTrigon
On 15.06.2012 10:57, Dr. Trigon wrote:
Hello all!
I have a bunch of cronie jobs calling qcronsub for several times with very similar settings (just the language of the wiki used changes). In total there are 5 jobs - regurarly (about ever 2nd day) one of those jobs does not get executed and it is always the same one. I do not get any error mail. Any idea?
Thanks a lot and greetings DrTrigon
_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello all!
Here some final update;
- - one one hand the situation is better now than 2 weeks ago (I don't know whether this could be related to the SGE update/maintenance) - - on the other hand I was not able to find why this issues came up (and this is bad)
But after all at the moment it works and I just wanted to thank you all involved here for your help and work!!
Greetings DrTrigon
On 15.06.2012 10:57, Dr. Trigon wrote:
Hello all!
I have a bunch of cronie jobs calling qcronsub for several times with very similar settings (just the language of the wiki used changes). In total there are 5 jobs - regurarly (about ever 2nd day) one of those jobs does not get executed and it is always the same one. I do not get any error mail. Any idea?
Thanks a lot and greetings DrTrigon
_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
toolserver-l@lists.wikimedia.org