I've also had one of my Java bots on willow crash with an OutOfMemoryError (the timestamp is when the exception happened):

Exiting at Sat Apr 21 20:31:45 GMT 2012
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:640)
<snip huge stack trace>

Shubinator

On Sun, Apr 22, 2012 at 9:53 AM, Krinkle <krinklemail@gmail.com> wrote:
Over the past 12 hours I've also gotten a fair number of error reports by mail and on irc about some of my projects.

A few samples:

* pywikipedia script:

> Your "cron" job on willow
> python $HOME/SVN/pywikipedia/fileprotectionsync_live.py > $HOME/bots/py_fileprotectionsync_live.log 2>&1
>
> produced the following output:
>
> ld.so.1: sh: fatal: mmap anon failed: Resource temporarily unavailable
> ld.so.1: sh: fatal: /usr/lib/libc.so.1: mmap failed: Resource temporarily unavailable
> ld.so.1: sh: fatal: libc.so.1: open failed: No such file or directory


* custom php script for [[m:CVN]]

> Your "cron" job on willow
> php $HOME/CVN-backend/cronjob_cvnapi.php > $HOME/CVN-backend/cronjob_cvnapi.log 2>&1
>
> produced the following output:
>
> Killed

* minutely start attempt for a long-running shell script in SGE

> Your "cron" job on willow
> cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh
>
> produced the following output:
>
> /opt/local/bin/cronsub[52]: 28142 Killed


* minutely start attempt for a long-running shell script in SGE

> Your "cron" job on willow
> cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh
>
> produced the following output:
>
> critical error: malloc() failure

* wmfCodeSearch exec

> Your "cron" job on willow
> php $HOME/wss_backend/runJobs.php > $HOME/logs/wmfCodeSearch/runJobs.log 2>&1
>
> produced the following output:
>
> Segmentation Fault - core dumped

etc.etc.

Due to the diversity of the errors, I can't find any link between the various failures.

I hope these help in determining/fixing the issue.

-- Krinkle


On Apr 22, 2012, at 6:51 AM, DeltaQuad wrote:

> Ok, so i've checked status.toolserver.org and have found nothing going
> on but some stuff has been breaking at random points today.
>
> I've had a Cron job return:
> Subject: Cron <deltaquad@willow> cronsub IPBEBot $HOME/IPBE/IPBE.py
>
> /opt/local/bin/cronsub[44]: 9793 Killed
>
>
> I've had another return:
> Subject: Cron <deltaquad@willow> cronsub UAABot $HOME/UAA/UAA.py
>
> error: not enough memory to allocate 2404 bytes in init_packbuffer
> Unable to run job: Error reading answer list from qmaster.
> Exiting.
>
> And then on a MMP (yes this is a customized message):
>
> The x script has failed. The error message received was:
> <b>A database error occured when attempting to process your request: </b><br />Failed to connect to database server !
> Please check the database to resolve this issue and ensure that private data is removed on schedule.
>
>
> Is it what we have running or is this a toolserver issue in general? and
> should I file a bug?
>
> (SysAdmins - Especially DaB. please don't take this as me being
> critical, I just wanna help if I can identify any problems and file a
> JIRA if needed ;) )
>
> --
> DeltaQuad
> English Wikipedia Administrator
>
>
> _______________________________________________
> Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
> Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette


_______________________________________________
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette