Toolserver having a bad day?

List overview All Threads
Download

newer

older

Rules clarifications

Re: [Toolserver-l] Blocking of...

DeltaQuad

22 Apr 2012 22 Apr '12

6:51 a.m.

Ok, so i've checked status.toolserver.org and have found nothing going on but some stuff has been breaking at random points today.

I've had a Cron job return: Subject: Cron deltaquad@willow cronsub IPBEBot $HOME/IPBE/IPBE.py

/opt/local/bin/cronsub[44]: 9793 Killed

I've had another return: Subject: Cron deltaquad@willow cronsub UAABot $HOME/UAA/UAA.py

error: not enough memory to allocate 2404 bytes in init_packbuffer Unable to run job: Error reading answer list from qmaster. Exiting.

And then on a MMP (yes this is a customized message):

The x script has failed. The error message received was: A database error occured when attempting to process your request: Failed to connect to database server ! Please check the database to resolve this issue and ensure that private data is removed on schedule.

Is it what we have running or is this a toolserver issue in general? and should I file a bug?

(SysAdmins - Especially DaB. please don't take this as me being critical, I just wanna help if I can identify any problems and file a JIRA if needed ;) )

-- DeltaQuad English Wikipedia Administrator

Show replies by date

Krinkle

22 Apr 22 Apr

4:53 p.m.

Over the past 12 hours I've also gotten a fair number of error reports by mail and on irc about some of my projects.

A few samples:

* pywikipedia script:

...

Your "cron" job on willow python $HOME/SVN/pywikipedia/fileprotectionsync_live.py > $HOME/bots/py_fileprotectionsync_live.log 2>&1

produced the following output:

ld.so.1: sh: fatal: mmap anon failed: Resource temporarily unavailable ld.so.1: sh: fatal: /usr/lib/libc.so.1: mmap failed: Resource temporarily unavailable ld.so.1: sh: fatal: libc.so.1: open failed: No such file or directory

* custom php script for [[m:CVN]]

...

Your "cron" job on willow php $HOME/CVN-backend/cronjob_cvnapi.php > $HOME/CVN-backend/cronjob_cvnapi.log 2>&1

produced the following output:

Killed

* minutely start attempt for a long-running shell script in SGE

...

Your "cron" job on willow cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh

produced the following output:

/opt/local/bin/cronsub[52]: 28142 Killed

* minutely start attempt for a long-running shell script in SGE

...

Your "cron" job on willow cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh

produced the following output:

critical error: malloc() failure

* wmfCodeSearch exec

...

Your "cron" job on willow php $HOME/wss_backend/runJobs.php > $HOME/logs/wmfCodeSearch/runJobs.log 2>&1

produced the following output:

Segmentation Fault - core dumped

etc.etc.

Due to the diversity of the errors, I can't find any link between the various failures.

I hope these help in determining/fixing the issue.

-- Krinkle

On Apr 22, 2012, at 6:51 AM, DeltaQuad wrote:

...

Ok, so i've checked status.toolserver.org and have found nothing going on but some stuff has been breaking at random points today.

I've had a Cron job return: Subject: Cron deltaquad@willow cronsub IPBEBot $HOME/IPBE/IPBE.py

/opt/local/bin/cronsub[44]: 9793 Killed

I've had another return: Subject: Cron deltaquad@willow cronsub UAABot $HOME/UAA/UAA.py

error: not enough memory to allocate 2404 bytes in init_packbuffer Unable to run job: Error reading answer list from qmaster. Exiting.

And then on a MMP (yes this is a customized message):

The x script has failed. The error message received was: A database error occured when attempting to process your request: Failed to connect to database server ! Please check the database to resolve this issue and ensure that private data is removed on schedule.

Is it what we have running or is this a toolserver issue in general? and should I file a bug?

(SysAdmins - Especially DaB. please don't take this as me being critical, I just wanna help if I can identify any problems and file a JIRA if needed ;) )

-- DeltaQuad English Wikipedia Administrator

Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

Shubinator

5:45 p.m.

I've also had one of my Java bots on willow crash with an OutOfMemoryError (the timestamp is when the exception happened):

Exiting at Sat Apr 21 20:31:45 GMT 2012 Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) <snip huge stack trace>

Shubinator

On Sun, Apr 22, 2012 at 9:53 AM, Krinkle krinklemail@gmail.com wrote:

...

Over the past 12 hours I've also gotten a fair number of error reports by mail and on irc about some of my projects.

A few samples:

pywikipedia script:

...
Your "cron" job on willow python $HOME/SVN/pywikipedia/fileprotectionsync_live.py >

$HOME/bots/py_fileprotectionsync_live.log 2>&1

...
produced the following output:

ld.so.1: sh: fatal: mmap anon failed: Resource temporarily unavailable ld.so.1: sh: fatal: /usr/lib/libc.so.1: mmap failed: Resource

temporarily unavailable

...
ld.so.1: sh: fatal: libc.so.1: open failed: No such file or directory

custom php script for [[m:CVN]]

...
Your "cron" job on willow php $HOME/CVN-backend/cronjob_cvnapi.php >

$HOME/CVN-backend/cronjob_cvnapi.log 2>&1

...
produced the following output:

Killed

minutely start attempt for a long-running shell script in SGE

...
Your "cron" job on willow cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh

produced the following output:

/opt/local/bin/cronsub[52]: 28142 Killed

minutely start attempt for a long-running shell script in SGE

...
Your "cron" job on willow cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh

produced the following output:

critical error: malloc() failure

wmfCodeSearch exec

...
Your "cron" job on willow php $HOME/wss_backend/runJobs.php > $HOME/logs/wmfCodeSearch/runJobs.log

2>&1

...
produced the following output:

Segmentation Fault - core dumped

etc.etc.

Due to the diversity of the errors, I can't find any link between the various failures.

I hope these help in determining/fixing the issue.

-- Krinkle

On Apr 22, 2012, at 6:51 AM, DeltaQuad wrote:

...
Ok, so i've checked status.toolserver.org and have found nothing going on but some stuff has been breaking at random points today.

I've had a Cron job return: Subject: Cron deltaquad@willow cronsub IPBEBot $HOME/IPBE/IPBE.py

/opt/local/bin/cronsub[44]: 9793 Killed

I've had another return: Subject: Cron deltaquad@willow cronsub UAABot $HOME/UAA/UAA.py

error: not enough memory to allocate 2404 bytes in init_packbuffer Unable to run job: Error reading answer list from qmaster. Exiting.

And then on a MMP (yes this is a customized message):

The x script has failed. The error message received was: A database error occured when attempting to process your request:

 Failed to connect to database server !

...
Please check the database to resolve this issue and ensure that private

data is removed on schedule.

...
Is it what we have running or is this a toolserver issue in general? and should I file a bug?

(SysAdmins - Especially DaB. please don't take this as me being critical, I just wanna help if I can identify any problems and file a JIRA if needed ;) )

-- DeltaQuad English Wikipedia Administrator

Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list:

https://wiki.toolserver.org/view/Mailing_list_etiquette

Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

Christopher David Howie

23 Apr 23 Apr

5:07 p.m.

On 04/22/2012 12:51 AM, DeltaQuad wrote:

...

The x script has failed. The error message received was: A database error occured when attempting to process your request: Failed to connect to database server ! Please check the database to resolve this issue and ensure that private data is removed on schedule.

Just to clear this one up, this turned out to be "code got changed in the production environment without going through version control and QA, and consequently broke stuff." I can't speak to the others.

-- Chris Howie http://www.chrishowie.com http://en.wikipedia.org/wiki/User:Crazycomputers If you correspond with me on a regular basis, please read this document: http://www.chrishowie.com/email-preferences/ PGP fingerprint: 2B7A B280 8B12 21CC 260A DF65 6FCE 505A CF83 38F5 ------------------------------------------------------------------------ IMPORTANT INFORMATION/DISCLAIMER This document should be read only by those persons to whom it is addressed. If you have received this message it was obviously addressed to you and therefore you can read it. Additionally, by sending an email to ANY of my addresses or to ANY mailing lists to which I am subscribed, whether intentionally or accidentally, you are agreeing that I am "the intended recipient," and that I may do whatever I wish with the contents of any message received from you, unless a pre-existing agreement prohibits me from so doing. This overrides any disclaimer or statement of confidentiality that may be included on your message.

4630

Age (days ago)

4631

Last active (days ago)

toolserver-l@lists.wikimedia.org

3 comments

4 participants

tags (0)

participants (4)

Christopher David Howie
DeltaQuad
Krinkle
Shubinator