Hi,

I got the email below telling me that my cron job running as william-avery-bot had throw an error, and I noticed that the Grid job that it kicks off hasn't run since.

I tried deleting the job using the instructions at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Stopping_jobs_with_%E2%80%98qdel%E2%80%99_and_%E2%80%98jstop%E2%80%99 but it appeared "stuck".

"qstat -xml" outputs the following:
<?xml version='1.0'?>
<job_info  xmlns:xsd="http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/qstat.xsd">
  <queue_info>
    <job_list state="running">
      <JB_job_number>9999749</JB_job_number>
      <JAT_prio>0.25319</JAT_prio>
      <JB_name>cron-TaxonbarSyncerBot</JB_name>
      <JB_owner>tools.william-avery-bot</JB_owner>
      <state>dr</state>
      <JAT_start_time>2021-03-25T17:49:16</JAT_start_time>
      <queue_name>task@tools-sgeexec-0916.tools.eqiad.wmflabs</queue_name>
      <slots>1</slots>
    </job_list>
  </queue_info>
  <job_info>
  </job_info>
</job_info>

But when I ssh to tools-sgeexec-0916.tools.eqiad.wmflabs I see no sign of any processes under tools.william-avery-bot, except the ones associated with my interactive session.

Can anyone help resolve this or advise of a venue to raise it?

Thanks in advance,

Will

---------- Forwarded message ---------
From: Cron Daemon <root@tools.wmflabs.org>
Date: Thu, 25 Mar 2021 at 16:49
Subject: Cron <tools.william-avery-bot@tools-sgecron-01> /usr/bin/jsub -N cron-TaxonbarSyncerBot -once -quiet ~/TaxonbarSyncerBot.sh
To: <tools.william-avery-bot@tools.wmflabs.org>


error: commlib error: got select error (Connection refused)
error: unable to send message to qmaster using port 6444 on host "tools-sgegrid-shadow.tools.eqiad1.wikimedia.cloud": got send error
Traceback (most recent call last):
  File "/usr/bin/job", line 48, in <module>
    root = xml.etree.ElementTree.fromstring(proc.stdout.read())
  File "/usr/lib/python3.5/xml/etree/ElementTree.py", line 1345, in XML
    return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 0