Hello all,
today I discovered (thanks to the mailing-list) that a few users have run several bot-instances in parallel on willow. I'm sure that these people did it by mistake, but it is annoying nevertheless and it is easy to fix: Use SGE.
The problem is that I wrote several of eMails about "use SGE!" already and somehow it did not work as good as it should (if you converted you stuff already: thank you and you can stop to read here ;-)). I understand that we all are busy with our lives and Wikipedia and that we all love to "do it right…later", but as you know that resources of the toolserver are limited. So I hereby declare the following new rule:
All bots have to run by SGE. A bot is every program or script that makes changes at a Wikimedia project. It does not matter if the bot runs periodically or continuous. The only exclusions are a.) interactive bots, b.) bots that can't run by SGE yet and c.) if you start a bot by hand for testing (no screen, no cron, no while). The rule will become active at Sunday, 10. February 2013. Exception b is almost NEVER the case, if it runs on a shell it is VERY likely that it can run by SGE.
Some time ago I wrote a simple SGE-how-to at [1]. Maybe you all can take a look and correct things and make things more clear. In very most cases the using of SGE IS easy.
Sincerely, DaB.
[1] https://wiki.toolserver.org/view/SGE_for_beginners
On Sat, Jan 12, 2013 at 6:01 AM, DaB. WP@daniel.baur4.info wrote:
Hello all,
today I discovered (thanks to the mailing-list) that a few users have run several bot-instances in parallel on willow. I'm sure that these people did it by mistake, but it is annoying nevertheless and it is easy to fix: Use SGE.
The problem is that I wrote several of eMails about "use SGE!" already and somehow it did not work as good as it should (if you converted you stuff already: thank you and you can stop to read here ;-)). I understand that we all are busy with our lives and Wikipedia and that we all love to "do it right…later", but as you know that resources of the toolserver are limited. So I hereby declare the following new rule:
All bots have to run by SGE. A bot is every program or script that makes changes at a Wikimedia project. It does not matter if the bot runs periodically or continuous. The only exclusions are a.) interactive bots, b.) bots that can't run by SGE yet and c.) if you start a bot by hand for testing (no screen, no cron, no while). The rule will become active at Sunday, 10. February 2013. Exception b is almost NEVER the case, if it runs on a shell it is VERY likely that it can run by SGE.
Is TS-1479[1] a valid exception B for not using SGE for some specific scripts?
[1] https://jira.toolserver.org/browse/TS-1479
-Liangent
Some time ago I wrote a simple SGE-how-to at [1]. Maybe you all can take a look and correct things and make things more clear. In very most cases the using of SGE IS easy.
Sincerely, DaB.
[1] https://wiki.toolserver.org/view/SGE_for_beginners
-- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Hello, At Friday 11 January 2013 23:42:03 DaB. wrote:
Is TS-1479[1] a valid exception B for not using SGE for some specific scripts?
yes. But I will speak with Merlissimo about a fix for this (AFAIR he had a problem with the given patch).
Sincerely, DaB.
On 11/01/13 23:43, DaB. wrote:
Hello, At Friday 11 January 2013 23:42:03 DaB. wrote:
Is TS-1479[1] a valid exception B for not using SGE for some specific scripts?
yes. But I will speak with Merlissimo about a fix for this (AFAIR he had a problem with the given patch).
It looks good to me (note it is *not* using embedded quotes as suggested in comment #1).
On 11/01/13 23:01, DaB. wrote:
c.) if you start a bot by hand for testing (no screen, no cron, no while).
Does this forrbid to run a bot by hand with an attached screen? Ie. you have run screen because you are afraid your local computer/connection could reset, not to ‘run & forget’. You would be attached eg. 90% of time, and pay (some) attention at the screen output...
Hello, At Monday 11 February 2013 14:20:25 DaB. wrote:
Does this forrbid to run a bot by hand with an attached screen? Ie. you have run screen because you are afraid your local computer/connection could reset, not to ‘run & forget’. You would be attached eg. 90% of time, and pay (some) attention at the screen output...
and how should I decide when I find such a bot what to do with it? It could be a "run & forget"-bot or a "my connection is unstable"-bot. Even if I would check that you are logined would not help because you could just be in the unstable part of your connection. So if you find a way: You are allowed to do. But I would prefer if you just put the bot in SGE and than watch the output(-file).
BTW: I "fixed" jira tonight – it is now just as broken as before amaranth crashed.
Sincerely, DaB.
Hello all,
in the last 3 days I spent a few hours a day to enforce the new rule (starting on willow). I wrote many emails and commented-out even more cron-lines. I learned a few thing doing so (for example some users think that 1 cron-line for a bot is not enough, that some users still uses our old phoenix and newtask programs, and some users seems to do cron-task-sharing…). Until now nobody lost his account, but I killed all misbehaving bots. The load on willow is now appreciable lower than before (the rebooted helped there too of course). It's a more or less boring task and you would REALLY help me if you convert your stuff to SGE yourself, before I kill and disable your bot. If you find that your bot was disabled you are allowed to re-enable it IF you convert it to SGE FIRST! Don't make me find a bot I disabled before running without SGE – you and I would hate that.
To say something positive: I found also bots using SGE (few, but I found).
Sincerely, DaB.
Hi
I have two tools that I'd like to convert to SGE now. Both are written in PHP. I tried the following but it didn't work out:
mazder@willow:~$ qcronsub -l h_rt=0:05:00 -l virtual_free=20M \ -l arch=* -l sql-user-m -N mazder-replicate-sequences -m ae \ '/home/mazder/public_html/replicate-sequences/update.php'
I only get a mail, telling me the exit code was 255. No more error messages or stdout/stderr output.
The only reason I could imagine would be that the script tries to read /home/".get_current_user()."/.my.cnf" in order to get mysql username & password. Could it be that this can't work on the running host?
Peter
Am 13.02.2013 16:36, schrieb DaB.:
Hello all,
in the last 3 days I spent a few hours a day to enforce the new rule (starting on willow). I wrote many emails and commented-out even more cron-lines. I learned a few thing doing so (for example some users think that 1 cron-line for a bot is not enough, that some users still uses our old phoenix and newtask programs, and some users seems to do cron-task-sharing…). Until now nobody lost his account, but I killed all misbehaving bots. The load on willow is now appreciable lower than before (the rebooted helped there too of course). It's a more or less boring task and you would REALLY help me if you convert your stuff to SGE yourself, before I kill and disable your bot. If you find that your bot was disabled you are allowed to re-enable it IF you convert it to SGE FIRST! Don't make me find a bot I disabled before running without SGE – you and I would hate that.
To say something positive: I found also bots using SGE (few, but I found).
Sincerely, DaB.
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Yes, when using get_current_user() the returned user name is "sgeadmin". Try changing your script to something like what is proposed in the toolserver wiki: https://wiki.toolserver.org/view/Database_access#PHP
Cheers, Kai
Am 19.02.2013 17:07, schrieb Peter Körner:
Hi
I have two tools that I'd like to convert to SGE now. Both are written in PHP. I tried the following but it didn't work out:
mazder@willow:~$ qcronsub -l h_rt=0:05:00 -l virtual_free=20M \ -l arch=* -l sql-user-m -N mazder-replicate-sequences -m ae \ '/home/mazder/public_html/replicate-sequences/update.php'
I only get a mail, telling me the exit code was 255. No more error messages or stdout/stderr output.
The only reason I could imagine would be that the script tries to read /home/".get_current_user()."/.my.cnf" in order to get mysql username & password. Could it be that this can't work on the running host?
Peter
Am 13.02.2013 16:36, schrieb DaB.:
Hello all,
in the last 3 days I spent a few hours a day to enforce the new rule (starting on willow). I wrote many emails and commented-out even more cron-lines. I learned a few thing doing so (for example some users think that 1 cron-line for a bot is not enough, that some users still uses our old phoenix and newtask programs, and some users seems to do cron-task-sharing…). Until now nobody lost his account, but I killed all misbehaving bots. The load on willow is now appreciable lower than before (the rebooted helped there too of course). It's a more or less boring task and you would REALLY help me if you convert your stuff to SGE yourself, before I kill and disable your bot. If you find that your bot was disabled you are allowed to re-enable it IF you convert it to SGE FIRST! Don't make me find a bot I disabled before running without SGE – you and I would hate that.
To say something positive: I found also bots using SGE (few, but I found).
Sincerely, DaB.
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Hi
Am 19.02.2013 17:38, schrieb Kai Nissen:
Yes, when using get_current_user() the returned user name is "sgeadmin". Try changing your script to something like what is proposed in the toolserver wiki: https://wiki.toolserver.org/view/Database_access#PHP
Thank you both, works now.
Peter
Hello, At Tuesday 19 February 2013 17:49:19 DaB. wrote:
mazder@willow:~$ qcronsub -l h_rt=0:05:00 -l virtual_free=20M \ -l arch=* -l sql-user-m -N mazder-replicate-sequences -m ae \ '/home/mazder/public_html/replicate-sequences/update.php'
In Addition to Kai: It is "-l sql-user-m=1" (see [1]).
I only get a mail, telling me the exit code was 255. No more error messages or stdout/stderr output.
There should be an error-log somewhere in your home now, but it is better to use the "-o" parameter to define WHERE the output should be redirect too (see [2]).
Sincerely, DaB.
[1] https://wiki.toolserver.org/view/SGE#Optional_resources [2] https://wiki.toolserver.org/view/SGE_for_beginners#Notification_and_logging
toolserver-l@lists.wikimedia.org