Hi,
I got the email below telling me that my cron job running as william-avery-bot had throw an error, and I noticed that the Grid job that it kicks off hasn't run since.
I tried deleting the job using the instructions at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Stopping_jobs_with_%... but it appeared "stuck".
"qstat -xml" outputs the following: <?xml version='1.0'?> <job_info xmlns:xsd=" http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qsta... "> <queue_info> <job_list state="running"> <JB_job_number>9999749</JB_job_number> <JAT_prio>0.25319</JAT_prio> <JB_name>cron-TaxonbarSyncerBot</JB_name> <JB_owner>tools.william-avery-bot</JB_owner> <state>dr</state> <JAT_start_time>2021-03-25T17:49:16</JAT_start_time> <queue_name>task@tools-sgeexec-0916.tools.eqiad.wmflabs</queue_name> <slots>1</slots> </job_list> </queue_info> <job_info> </job_info> </job_info>
But when I ssh to tools-sgeexec-0916.tools.eqiad.wmflabs I see no sign of any processes under tools.william-avery-bot, except the ones associated with my interactive session.
Can anyone help resolve this or advise of a venue to raise it?
Thanks in advance,
Will
---------- Forwarded message --------- From: Cron Daemon root@tools.wmflabs.org Date: Thu, 25 Mar 2021 at 16:49 Subject: Cron tools.william-avery-bot@tools-sgecron-01 /usr/bin/jsub -N cron-TaxonbarSyncerBot -once -quiet ~/TaxonbarSyncerBot.sh To: tools.william-avery-bot@tools.wmflabs.org
error: commlib error: got select error (Connection refused) error: unable to send message to qmaster using port 6444 on host "tools-sgegrid-shadow.tools.eqiad1.wikimedia.cloud": got send error Traceback (most recent call last): File "/usr/bin/job", line 48, in <module> root = xml.etree.ElementTree.fromstring(proc.stdout.read()) File "/usr/lib/python3.5/xml/etree/ElementTree.py", line 1345, in XML return parser.close() xml.etree.ElementTree.ParseError: no element found: line 1, column 0
On Fri, Mar 26, 2021 at 3:27 PM William Avery willm.avery@gmail.com wrote:
Hi,
I got the email below telling me that my cron job running as william-avery-bot had throw an error, and I noticed that the Grid job that it kicks off hasn't run since.
I tried deleting the job using the instructions at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Stopping_jobs_with_%... but it appeared "stuck".
I have "force deleted" your job using my Toolforge admin rights.
$ sudo qdel -f 9999749 root forced the deletion of job 9999749
The Toolforge grid engine had numerous problems yesterday which led to the scheduler losing track of the state of many jobs. Brooke did several rounds of looking for these and cleaning the queue state, but obviously yours was not cleaned up in that process. Thank you for your report, and I hope you can get your tool back into its proper working state.
Bryan
Thanks Bryan,
It's now resumed it's not particularly critical task: https://www.wikidata.org/wiki/Special:Contributions/William_Avery_Bot
Will
On Fri, 26 Mar 2021 at 21:45, Bryan Davis bd808@wikimedia.org wrote:
On Fri, Mar 26, 2021 at 3:27 PM William Avery willm.avery@gmail.com wrote:
Hi,
I got the email below telling me that my cron job running as
william-avery-bot had throw an error, and I noticed that the Grid job that it kicks off hasn't run since.
I tried deleting the job using the instructions at
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Stopping_jobs_with_%... but it appeared "stuck".
I have "force deleted" your job using my Toolforge admin rights.
$ sudo qdel -f 9999749 root forced the deletion of job 9999749
The Toolforge grid engine had numerous problems yesterday which led to the scheduler losing track of the state of many jobs. Brooke did several rounds of looking for these and cleaning the queue state, but obviously yours was not cleaned up in that process. Thank you for your report, and I hope you can get your tool back into its proper working state.
Bryan
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org