Cron on submit

List overview All Threads
Download

newer

older

Cron errors at hawthorn for...

Postmortem: Partial...

Dr. Trigon

3 Feb 2013 3 Feb '13

11:06 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Hello!

There were 2 messages here during Janury reporting problems with cron. I am now noticing issues with my cronjobs too. By looking at [1] you are able to see that the strange behaviour started somewhen week 2 and 3 (mid January). Do we have again cron (the server) running out of memory or what is the issue here? DaB can you may be give some hints here? Or someone else?

[1] http://munin.toolserver.org/Login/hawthorn/cron_jobs_sh.html

Thanks a lot and greetings! DrTrigon

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlEO31oACgkQAXWvBxzBrDDGugCg1jb5AWvJJfNlBJjebcfOA2cr tPAAoNtSSs6auQFkp1unbFpEv+Zi07Zu =u2tk -----END PGP SIGNATURE-----

Show replies by date

DaB.

4 Feb 4 Feb

1:30 a.m.

Hello, At Monday 04 February 2013 01:23:08 DaB. wrote:

...

Hello!

There were 2 messages here during Janury reporting problems with cron.

Both where on willow AFAIS, which is overloaded.

...

I am now noticing issues with my cronjobs too.

What exactly is the problem?

...

...
By looking at [1] you

are able to see that the strange behaviour started somewhen week 2 and 3 (mid January).

Sorry, I don't see anything. All I see is that the maximum number of cronjobs varies more since a few weeks (but we are way from the number in autumn if you look at the year-graph).

...

Do we have again cron (the server) running out of memory or what is the issue here? DaB can you may be give some hints here? Or someone else?

I checked hawthorn and there are a few memory-problems at peak-times. I will see if I can add another patch.

...

Thanks a lot and greetings! DrTrigon

Sincerely, DaB.

-- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

Dr. Trigon

8:21 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 04.02.2013 01:30, DaB. wrote:

...

...
I am now noticing issues with my cronjobs too.

What exactly is the problem?

Cronjobs not getting executed (as usual... ;) - at the moment I do also have jobs in SGE queue that do not get runned at all (at least 4).

...

...
...
By looking at [1] you

are able to see that the strange behaviour started somewhen week 2 and 3 (mid January).

Sorry, I don't see anything. All I see is that the maximum number of cronjobs varies more since a few weeks (but we are way from the number in autumn if you look at the year-graph).

Before mid January we had a stable plateau (more or less constant values of jobs per time). - From then it started breaking down - in fact this is just a guess from looking at the data - before it was way more stable...

...

...
Do we have again cron (the server) running out of memory or what is the issue here? DaB can you may be give some hints here? Or someone else?

I checked hawthorn and there are a few memory-problems at peak-times. I will see if I can add another patch.

To mention the memory as possible issue was just a guess, but there is definately something wrong and not working as usual.

Thanks for your time DaB and greetings! DrTrigon

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlEQCjoACgkQAXWvBxzBrDCs3QCeOAh2dSykUJlB9l1V/ofmbQMI 88EAoM01EEBqs9NjYZOETQsFD4VkyeR8 =gpCx -----END PGP SIGNATURE-----

Wolfgang ten Weges

10 Feb 10 Feb

11:48 p.m.

a "top" shows that the culprits are likely the same as last time : All the CPU, and a lot of process slots (and cron slots most probably) are currently (ab)used by /home/javadyou/pywikipedia/radeh7.py and /home/reza/pywikipedia/radeh.py

Wolfgang ten Weges/Wolfgang

Le 04/02/2013 20:21, Dr. Trigon a écrit :

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 04.02.2013 01:30, DaB. wrote:

...
...
I am now noticing issues with my cronjobs too.

What exactly is the problem?

Cronjobs not getting executed (as usual... ;) - at the moment I do also have jobs in SGE queue that do not get runned at all (at least 4).

...
...
...
By looking at [1] you

are able to see that the strange behaviour started somewhen week 2 and 3 (mid January).

Sorry, I don't see anything. All I see is that the maximum number of cronjobs varies more since a few weeks (but we are way from the number in autumn if you look at the year-graph).

Before mid January we had a stable plateau (more or less constant values of jobs per time).

From then it started breaking down - in fact this is just a guess from

looking at the data - before it was way more stable...

...
...
Do we have again cron (the server) running out of memory or what is the issue here? DaB can you may be give some hints here? Or someone else?

I checked hawthorn and there are a few memory-problems at peak-times. I will see if I can add another patch.

To mention the memory as possible issue was just a guess, but there is definately something wrong and not working as usual.

Thanks for your time DaB and greetings! DrTrigon -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlEQCjoACgkQAXWvBxzBrDCs3QCeOAh2dSykUJlB9l1V/ofmbQMI 88EAoM01EEBqs9NjYZOETQsFD4VkyeR8 =gpCx -----END PGP SIGNATURE-----

Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

Carl (CBM)

12 Feb 12 Feb

7:35 p.m.

On Sun, Feb 10, 2013 at 5:48 PM, Wolfgang ten Weges koneko@aliceadsl.fr wrote:

...

a "top" shows that the culprits are likely the same as last time : All the CPU, and a lot of process slots (and cron slots most probably) are currently (ab)used by /home/javadyou/pywikipedia/radeh7.py and /home/reza/pywikipedia/radeh.py

There was an announcement on toolserver-l a while back about a new rule that should be in effect now, which should resolve some of these problems:

http://lists.wikimedia.org/pipermail/toolserver-l/2013-January/005625.html

- Carl

Michael Andersen

13 Feb 13 Feb

2:31 p.m.

I just noticed the text when you login:

"Users are now encouraged to use job scheduling (SGE) for *all* tools!"

Perhaps "encouraged" is no longer the right way to write it?

I've been busy and sick so I did not manage to rewrite my tasks so I stopped them all instead. Perhaps someone could create a tool to extend the number of hours per day? :-D

MGA73

-----Oprindelig meddelelse----- Fra: toolserver-l-bounces@lists.wikimedia.org [mailto:toolserver-l-bounces@lists.wikimedia.org] På vegne af Carl (CBM) Sendt: 12. februar 2013 19:35 Til: Wikimedia Toolserver Emne: Re: [Toolserver-l] Cron on submit

On Sun, Feb 10, 2013 at 5:48 PM, Wolfgang ten Weges koneko@aliceadsl.fr wrote:

...

a "top" shows that the culprits are likely the same as last time : All the CPU, and a lot of process slots (and cron slots most probably) are currently (ab)used by /home/javadyou/pywikipedia/radeh7.py and /home/reza/pywikipedia/radeh.py

There was an announcement on toolserver-l a while back about a new rule that should be in effect now, which should resolve some of these problems:

http://lists.wikimedia.org/pipermail/toolserver-l/2013-January/005625.html

- Carl

_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

Marco Fleckinger

2:42 p.m.

On 02/13/2013 02:31 PM, Michael Andersen wrote:

...

Perhaps someone could create a tool to extend the number of hours per day? :-D

I think this tool would be needed very urgently. Is there any space API or hardware interface specification? :=D

Cheers

Marco

Dr. Trigon

6 Feb 6 Feb

6:21 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Thanks DaB whatever you (or someone else? ;) did!

Now it works again as you can see from looking at [1] there was clearly a drop in executed jobs now it is on a constant level again! Cool!

[1] http://munin.toolserver.org/Login/hawthorn/cron_jobs_sh.html

Thanks a greetings!! DrTrigon

ps.: DaB what was the solution? I am curious... :)

On 04.02.2013 01:30, DaB. wrote:

...

Hello, At Monday 04 February 2013 01:23:08 DaB. wrote:

...
Hello!

There were 2 messages here during Janury reporting problems with cron.

Both where on willow AFAIS, which is overloaded.

...
I am now noticing issues with my cronjobs too.

What exactly is the problem?

...
...
By looking at [1] you

are able to see that the strange behaviour started somewhen week 2 and 3 (mid January).

Sorry, I don't see anything. All I see is that the maximum number of cronjobs varies more since a few weeks (but we are way from the number in autumn if you look at the year-graph).

...
Do we have again cron (the server) running out of memory or what is the issue here? DaB can you may be give some hints here? Or someone else?

I checked hawthorn and there are a few memory-problems at peak-times. I will see if I can add another patch.

...
Thanks a lot and greetings! DrTrigon

Sincerely, DaB.

_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlESkSsACgkQAXWvBxzBrDAtVQCfak8Cfgq5u3xOWVHjKzzWUvoY V/wAn3zx4JBxXZGFIOqKZj59n2irl25G =aPk/ -----END PGP SIGNATURE-----

4347

Age (days ago)

4357

Last active (days ago)

toolserver-l@lists.wikimedia.org

7 comments

6 participants

tags (0)

participants (6)

Carl (CBM)
DaB.
Dr. Trigon
Marco Fleckinger
Michael Andersen
Wolfgang ten Weges