I will be performing dumps maintenance this weekend 10./11. 3.
No files are going to be deleted, however, due to moving of files, if you have running scripts, they might be affected.
For some transitive period there will be symlinks left behind, but in future, only dedicated directories should be used for dumps, so all symlinks will be deleted. This will be noticed enough ahead so you will have enough time to update your scripts.
Also, in round two, there will be automatic dump update service available for requested dumps, so you will not have to take care about their downloading and updating anymore.
Once the maintenance is finished, I'll announce the results and possible dependencies.
Can someone verify the integrity of the page table for me, please?
mzmcbride@willow:~$ sql enwiki_p;
mysql> select * from page where page_namespace = 6 and page_title =
Empty set (0.00 sec)
mysql> select * from page where page_namespace = 0 and page_title =
Empty set (0.00 sec)
To me, this indicates that either the master has some very serious problem
(unlikely) or that the Toolserver's copy of the database has become corrupt
(much more likely). Can anyone verify my findings?
Hi there. I made a cron for operating *clean_sandbox.py* every 6 hours, I
made a modified copy from *clean_sandbox.py* for me and uploaded it to my
account on toolserver (outside the pywikipedia folder).
The problem i face is that i can't write the proper code, I made this code:
#$ -j y
#$ -o /dev/null
python clean_sandbox.py -lang:"ar" -family:wikipedia
But it didn't work.
Hello toolserver users,
as you may know, there were some bigger problems related to sun grid engine starting in november 2011. I asked DaB. to become a sge manager for helping them to solve these problems.
During the last months i silently started reconfiguring sge in small steps so that it was always possible to use it as before and no downtime was needed. This took some time because i am only a
volunteer and i had to changes nearly everything. Additional Nosy and DaB. changed some solaris configurations that i proposed.
All scripts that used grid engine before can continue to run without changes. But maybe you can increase your script performance by adding additional informations.
In the past you were requested to choose a suitable queue (all.q or longrun) for your job. Many people choosed a queue that did not fit best for their task. So i changed this procedure.
Now you have to add all resources that your job needs during runtime on job submition. Then sge will choose queue and host that fits best for your requirements. So you don't have to care about
different queues anymore (you may have seen that there are much more queues than before).
All jobs must at least contain informations about maximum runtime (h_rt) and peak memory usage (virtual_free). This information may get obligatory in future. Currently only a warning message is shown.
You also have to request other resources like sql connections, free temp space, etc. if these are needed by your job. Please read documentation on toolserverwiki i have updated today:
This currently contains the main informations you need to know, but maybe i add some more examples later.
I also have added a new script called "qcronsub". This is the replacement for "cronsub" most of you used before. Differently to cronsub it accepts the same arguments as the original "qsub" command by
grid engine. So now it is possible the add all resource values at command line.
Please note that you should always use cronie at submit.toolserver.org for submitting jobs to sge by cron. These cron tasks will always be executed even if one host (e.g. clematis or willow) is down.
This is the suggested usage since about 17 months. Many people have migrated their cron jobs from nightshade to willow during the last weeks. But they will have the same problem again if willow must
be shut down for a longer time (which hopefully never happens).
This morning Dr. Trigon complained that his job "mainbot" did not run immediatly and was queued for a long time. I would guess he submitted his job from cron using "cronsub mainbot -l
This indicates that the job runs forevery (longrun) with unkown memory usage. So grid engine was only able to start this job on willow.
It is not possible to run infinite job on the webservers (only shorter jobs are allowed so that most jobs have finished before high webserver usage is expected during the evening). Nor it was possible
to run it on the server running mail transfer agent which only have less than 500MB memory free, but much cpu power (expected memory usage is unkown). Other servers like nightshade and yarrow aren't
According to the last run of this job it takes about 2 hours and 30 minutes runtime and had a peek usage of 370 MB memory. I got these values by requesting grid engine about usage statistics of the
last ten days: "qacct -j mainbot -d 10".
To be safe that the job gets always enough resouces i would suggest to raise the values to 4 hours and 500MB memory. It is not a problem if you request more resouces than really needed, but job
needing more resources than requested may be killed. So the new submit command would be:
"qcronsub -N mainbot -l h_rt=4:00:00 -l virtual_free=500MB /home/drtrigon/pywikipedia/mainbot.py"
This job could run on both webserver during low load and on willow. Grid engine also knows that it cannot run on mailservers because of high memory usage.
The job "ircbot" by drtrigon was started on mailserver last night. This job really needs an infinity runtime (-l h_rt=INFINITY), but only uses low memory (40M).
Jobs that have a limited runtime should not be submitted with an infinity runtime value - even if the expected runtime is some days or weeks. E.g. pywikipedia script should be updated regulary from
svn, so the must be end after some days and restartet. e.g. "qcronsub -l h_rt 120:0:0 scriptname" submits a job with a maximum runtime of five days.
If you have any questions about grid engine usage feel free to ask me or the toolserver admins on irc or mailing list.
Toolserver grid currently uses four servers and still has many cpu power and memory available. Only willow is currently very busy. Please do not run process on other servers than on login server
(willow and nightshade) without sge resource control (except cronie for submitting jobs to grid engine on host submit).
As, so far — to my knowlegde, nothing has been settled regarding the new MMP interwiki, I suggest that *WE* (TS users, as DaB said) speed up this step... It would be a pity if nothing has been done at April 2nd, when the new rule 9.4 will be effective (no continous interwiki-bot on the TS)...
But how to choose users who will be in the MMP and actually who will be entirely responsive of the bot ? I think an election would be a bit difficult to organize - but if someone who has spare time feels like organizing it, why not ?
So I suggest that people interested in getting involved in the project let all the TS users know it, emailing the toolserver-l list... Then, TS users could support or critize one or more persons till a consensus is found... And in this mannner, people designed will have a few weeks (about a fortnight I think, but it depends on the last of the debate).
Here is my plan... What do you think about it ? Any other ideas ?
Personally, I've run a globally flagged interwiki-bot (ZéroBot) for a while, under a script I wrote (but still using interwiki.py from the pywikipediabot framework), and I'm pretty interested in being a member of the MMP...
The login server Willow seems to be having load issues. Several commands are failing because of load issues, it appears. For example, a 'dir' command will return '-bash: fork: Not enough space' about 50% of the time. These issues have been reported by numerous people in the IRC channel over the last 10 or so minutes (as well as tsnag, who was practically spamming).
This is a heads up email, hopefully the ops can take a look at this issue and correct it.