I've also had one of my Java bots on willow crash with an OutOfMemoryError (the timestamp is when the exception happened):
Exiting at Sat Apr 21 20:31:45 GMT 2012 Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) <snip huge stack trace>
Shubinator
On Sun, Apr 22, 2012 at 9:53 AM, Krinkle krinklemail@gmail.com wrote:
Over the past 12 hours I've also gotten a fair number of error reports by mail and on irc about some of my projects.
A few samples:
- pywikipedia script:
Your "cron" job on willow python $HOME/SVN/pywikipedia/fileprotectionsync_live.py >
$HOME/bots/py_fileprotectionsync_live.log 2>&1
produced the following output:
ld.so.1: sh: fatal: mmap anon failed: Resource temporarily unavailable ld.so.1: sh: fatal: /usr/lib/libc.so.1: mmap failed: Resource
temporarily unavailable
ld.so.1: sh: fatal: libc.so.1: open failed: No such file or directory
- custom php script for [[m:CVN]]
Your "cron" job on willow php $HOME/CVN-backend/cronjob_cvnapi.php >
$HOME/CVN-backend/cronjob_cvnapi.log 2>&1
produced the following output:
Killed
- minutely start attempt for a long-running shell script in SGE
Your "cron" job on willow cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh
produced the following output:
/opt/local/bin/cronsub[52]: 28142 Killed
- minutely start attempt for a long-running shell script in SGE
Your "cron" job on willow cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh
produced the following output:
critical error: malloc() failure
- wmfCodeSearch exec
Your "cron" job on willow php $HOME/wss_backend/runJobs.php > $HOME/logs/wmfCodeSearch/runJobs.log
2>&1
produced the following output:
Segmentation Fault - core dumped
etc.etc.
Due to the diversity of the errors, I can't find any link between the various failures.
I hope these help in determining/fixing the issue.
-- Krinkle
On Apr 22, 2012, at 6:51 AM, DeltaQuad wrote:
Ok, so i've checked status.toolserver.org and have found nothing going on but some stuff has been breaking at random points today.
I've had a Cron job return: Subject: Cron deltaquad@willow cronsub IPBEBot $HOME/IPBE/IPBE.py
/opt/local/bin/cronsub[44]: 9793 Killed
I've had another return: Subject: Cron deltaquad@willow cronsub UAABot $HOME/UAA/UAA.py
error: not enough memory to allocate 2404 bytes in init_packbuffer Unable to run job: Error reading answer list from qmaster. Exiting.
And then on a MMP (yes this is a customized message):
The x script has failed. The error message received was: <b>A database error occured when attempting to process your request:
</b><br />Failed to connect to database server !
Please check the database to resolve this issue and ensure that private
data is removed on schedule.
Is it what we have running or is this a toolserver issue in general? and should I file a bug?
(SysAdmins - Especially DaB. please don't take this as me being critical, I just wanna help if I can identify any problems and file a JIRA if needed ;) )
-- DeltaQuad English Wikipedia Administrator
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list:
https://wiki.toolserver.org/view/Mailing_list_etiquette
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette