Ok, so i've checked status.toolserver.org and have found nothing going on but some stuff has been breaking at random points today.
I've had a Cron job return: Subject: Cron deltaquad@willow cronsub IPBEBot $HOME/IPBE/IPBE.py
/opt/local/bin/cronsub[44]: 9793 Killed
I've had another return: Subject: Cron deltaquad@willow cronsub UAABot $HOME/UAA/UAA.py
error: not enough memory to allocate 2404 bytes in init_packbuffer Unable to run job: Error reading answer list from qmaster. Exiting.
And then on a MMP (yes this is a customized message):
The x script has failed. The error message received was: <b>A database error occured when attempting to process your request: </b><br />Failed to connect to database server ! Please check the database to resolve this issue and ensure that private data is removed on schedule.
Is it what we have running or is this a toolserver issue in general? and should I file a bug?
(SysAdmins - Especially DaB. please don't take this as me being critical, I just wanna help if I can identify any problems and file a JIRA if needed ;) )
Over the past 12 hours I've also gotten a fair number of error reports by mail and on irc about some of my projects.
A few samples:
* pywikipedia script:
Your "cron" job on willow python $HOME/SVN/pywikipedia/fileprotectionsync_live.py > $HOME/bots/py_fileprotectionsync_live.log 2>&1
produced the following output:
ld.so.1: sh: fatal: mmap anon failed: Resource temporarily unavailable ld.so.1: sh: fatal: /usr/lib/libc.so.1: mmap failed: Resource temporarily unavailable ld.so.1: sh: fatal: libc.so.1: open failed: No such file or directory
* custom php script for [[m:CVN]]
Your "cron" job on willow php $HOME/CVN-backend/cronjob_cvnapi.php > $HOME/CVN-backend/cronjob_cvnapi.log 2>&1
produced the following output:
Killed
* minutely start attempt for a long-running shell script in SGE
Your "cron" job on willow cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh
produced the following output:
/opt/local/bin/cronsub[52]: 28142 Killed
* minutely start attempt for a long-running shell script in SGE
Your "cron" job on willow cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh
produced the following output:
critical error: malloc() failure
* wmfCodeSearch exec
Your "cron" job on willow php $HOME/wss_backend/runJobs.php > $HOME/logs/wmfCodeSearch/runJobs.log 2>&1
produced the following output:
Segmentation Fault - core dumped
etc.etc.
Due to the diversity of the errors, I can't find any link between the various failures.
I hope these help in determining/fixing the issue.
-- Krinkle
On Apr 22, 2012, at 6:51 AM, DeltaQuad wrote:
Ok, so i've checked status.toolserver.org and have found nothing going on but some stuff has been breaking at random points today.
I've had a Cron job return: Subject: Cron deltaquad@willow cronsub IPBEBot $HOME/IPBE/IPBE.py
/opt/local/bin/cronsub[44]: 9793 Killed
I've had another return: Subject: Cron deltaquad@willow cronsub UAABot $HOME/UAA/UAA.py
error: not enough memory to allocate 2404 bytes in init_packbuffer Unable to run job: Error reading answer list from qmaster. Exiting.
And then on a MMP (yes this is a customized message):
The x script has failed. The error message received was: <b>A database error occured when attempting to process your request: </b><br />Failed to connect to database server ! Please check the database to resolve this issue and ensure that private data is removed on schedule.
Is it what we have running or is this a toolserver issue in general? and should I file a bug?
(SysAdmins - Especially DaB. please don't take this as me being critical, I just wanna help if I can identify any problems and file a JIRA if needed ;) )
-- DeltaQuad English Wikipedia Administrator
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
I've also had one of my Java bots on willow crash with an OutOfMemoryError (the timestamp is when the exception happened):
Exiting at Sat Apr 21 20:31:45 GMT 2012 Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) <snip huge stack trace>
Shubinator
On Sun, Apr 22, 2012 at 9:53 AM, Krinkle krinklemail@gmail.com wrote:
Over the past 12 hours I've also gotten a fair number of error reports by mail and on irc about some of my projects.
A few samples:
- pywikipedia script:
Your "cron" job on willow python $HOME/SVN/pywikipedia/fileprotectionsync_live.py >
$HOME/bots/py_fileprotectionsync_live.log 2>&1
produced the following output:
ld.so.1: sh: fatal: mmap anon failed: Resource temporarily unavailable ld.so.1: sh: fatal: /usr/lib/libc.so.1: mmap failed: Resource
temporarily unavailable
ld.so.1: sh: fatal: libc.so.1: open failed: No such file or directory
- custom php script for [[m:CVN]]
Your "cron" job on willow php $HOME/CVN-backend/cronjob_cvnapi.php >
$HOME/CVN-backend/cronjob_cvnapi.log 2>&1
produced the following output:
Killed
- minutely start attempt for a long-running shell script in SGE
Your "cron" job on willow cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh
produced the following output:
/opt/local/bin/cronsub[52]: 28142 Killed
- minutely start attempt for a long-running shell script in SGE
Your "cron" job on willow cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh
produced the following output:
critical error: malloc() failure
- wmfCodeSearch exec
Your "cron" job on willow php $HOME/wss_backend/runJobs.php > $HOME/logs/wmfCodeSearch/runJobs.log
2>&1
produced the following output:
Segmentation Fault - core dumped
etc.etc.
Due to the diversity of the errors, I can't find any link between the various failures.
I hope these help in determining/fixing the issue.
-- Krinkle
On Apr 22, 2012, at 6:51 AM, DeltaQuad wrote:
Ok, so i've checked status.toolserver.org and have found nothing going on but some stuff has been breaking at random points today.
I've had a Cron job return: Subject: Cron deltaquad@willow cronsub IPBEBot $HOME/IPBE/IPBE.py
/opt/local/bin/cronsub[44]: 9793 Killed
I've had another return: Subject: Cron deltaquad@willow cronsub UAABot $HOME/UAA/UAA.py
error: not enough memory to allocate 2404 bytes in init_packbuffer Unable to run job: Error reading answer list from qmaster. Exiting.
And then on a MMP (yes this is a customized message):
The x script has failed. The error message received was: <b>A database error occured when attempting to process your request:
</b><br />Failed to connect to database server !
Please check the database to resolve this issue and ensure that private
data is removed on schedule.
Is it what we have running or is this a toolserver issue in general? and should I file a bug?
(SysAdmins - Especially DaB. please don't take this as me being critical, I just wanna help if I can identify any problems and file a JIRA if needed ;) )
-- DeltaQuad English Wikipedia Administrator
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list:
https://wiki.toolserver.org/view/Mailing_list_etiquette
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
On 04/22/2012 12:51 AM, DeltaQuad wrote:
The x script has failed. The error message received was: <b>A database error occured when attempting to process your request: </b><br />Failed to connect to database server ! Please check the database to resolve this issue and ensure that private data is removed on schedule.
Just to clear this one up, this turned out to be "code got changed in the production environment without going through version control and QA, and consequently broke stuff." I can't speak to the others.
toolserver-l@lists.wikimedia.org