-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello!
There were 2 messages here during Janury reporting problems with cron.
I am now noticing issues with my cronjobs too. By looking at [1] you
are able to see that the strange behaviour started somewhen week 2 and
3 (mid January). Do we have again cron (the server) running out of
memory or what is the issue here? DaB can you may be give some hints
here? Or someone else?
[1] http://munin.toolserver.org/Login/hawthorn/cron_jobs_sh.html
Thanks a lot and greetings!
DrTrigon
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlEO31oACgkQAXWvBxzBrDDGugCg1jb5AWvJJfNlBJjebcfOA2cr
tPAAoNtSSs6auQFkp1unbFpEv+Zi07Zu
=u2tk
-----END PGP SIGNATURE-----
Hello all,
great parts of the toolserver-cluster were down or very slow in the last few
hours. AFAIS it was a problem with the user-store or rosemary (where the user-
store is physically connected). I rebooted rosemary, but the reboot showed
problems with its IPv6-address. I tried to fix that what caused several other
reboots. Rosemary is now up and running but the user-store is not available
(looks like Nosy just mounted it without updating the fstab-file). So I was
forced to remove the user-store everywhere (beside on willow because it need a
reboot to do that and a reboot is scheduled already later for today).
I will try if I can find the partition for user-store and mount it but I have
not much hope (there are way to many devices to try) – just to be clear: There
is no data lost. Also away will be munin, because its data is also mounted on
that host. I fear that we have to wait for Nosy to recover before we get the
user-store back.
tl;dr: TS had problems, user-store is away.
Sincerely,
DaB.
--
Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885
Hello all,
while I was killing some bot-processes on willow to reduce the high load I
accidentally pressed return too early and killed a random number of processes
with that. I restarted the system-processes, but I am not sure if everything
is completely right. Just to be sure I hereby announce a reboot for tomorrow,
Monday, 19:05 UTC.
Willow will be away for some minutes. Please notice that the history shows
that cron on solaris does not start all processes during the reboot, so you
should check after the reboot if everything works. Please notice that in a few
minutes the new "no bots without SGE"-rule ([1]) becomes active, so please
make sure that your bot uses SGE or I might disable it.
I have no idea how many user-processes were killed, but I'm sorry that it did
happen nevertheless.
Sincerely,
DaB.
[1] http://lists.wikimedia.org/pipermail/toolserver-announce/2013-
January/000557.html
--
Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885
Hi everyone,
Wikimedia Nederland invites all developers to the Amsterdam Hackathon 2013.
The hackathon is an opportunity for all Wikimedia community developers
and sysadmins to come together, squash bugs and write great new features
& tools. Unlike the previous years (2012, 2011, etc.) this Hackathon
won't be in Berlin, but in Amsterdam.
The event is open to a wide range of developers. We welcome both
seasoned and new developers as well as people working on MediaWiki,
tools, pywikiedia, gadgets, extensions, templates … .
It takes place from 24-26 May. If you’d like to attend, please save the
date!
There will not be an entrance fee for the event itself, but a
registration is mandatory. There will be a limited number of
scholarships available, details to be provided ASAP. We're currently
finalizing the arrangements of the venue. When we're done with that
we'll open registration.
Keep an eye on https://www.mediawiki.org/wiki/Amsterdam_Hackathon_2013
for updates!
Check this page for updates!
Maarten
Ps. Please spread the word
Hello all,
for historical reasons s2 and s5 are together on one host (cassia). Because
cassia is quite overloaded, the sharing will end soon and I will move s2 away.
For this I need your help because s2 and s5 share also the user-databases and
there is not hint which user-database is needed where.
So if you use user-databases for joining with s2 (two!) please add the name of
the user-database to [1] until
Friday, 8. February 18:00 UTC.
It will take only a few minutes to add your user-databases there, so please do
it. If you do not your user-databases there your tools will break after the
split, but of course that can be fixed later.
Sincerely,
DaB.
[1] https://wiki.toolserver.org/view/User:Dab/s2-userdatabaes
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Hello all,
around 3 o'clock UTC we lost connection to amaranth, our server in Tampa which
handles the connection to the WMF-database-servers. Until now it is unclear if
it is a server-problem or a connection-problem. I have tried to reach the wmf-
techs, but no response yet. I will keep you updated by mail, because JIRA is
also hosted at amaranth and so also down.
Sincerely,
DaB.
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Hi,
https://jira.toolserver.org/browse/TS-1599 says inter alia:
| This is not the first time I try to open this ticket at
| Jira. The last two times I tried, the message "You are not
| authorized to perform this operation. Please try to log in
| or sign up for an account. Close this dialog and press
| refresh in your browser" appeared and the whole ticket was
| gone and I got logged out automatically. wtf? It's really
| disappointing that there seems to be no way to restore the
| whole story I'd written. :-((
This is still the same issue as reported by Krinkle at
http://permalink.gmane.org/gmane.org.wikimedia.toolserver/5506
in November. The backlog in JIRA for various other issues
in the Toolserver project is quite impressive as well.
So we should add more roots, ideally of course Solaris/Linux
bilinguals with 20+ years of HA and MySQL replication expe-
rience and lots of spare time on their hands, practically
any bright mind who can track down some bug, update the pup-
pet configuration and care for all the other tidbits while
documenting their work meticulously, so that the roots can
focus on the more complicated stuff.
Silke, what are the requirements WMDE imposes on toolserver
admins? Being of legal age in their country of citizenship,
residence and Germany? Disclosing their identity to WMDE?
Anything else?
Tim
Hello,
I have become sick as my baby has with high fever and it seems the rest of the family will join.
Yesterday I was already in hospital with the baby but without me thinking about their treatments it wont work.
She has a resistant bactery that the doctors wanted to treat with a antibiotic that will not work there - fortunatelly I
forced the ambulant doctor to measure the bacerty some days ago.
I personally feel fever too and we will now see if there is a second sort of bacerty or if its "just" one resistant.
This means I am offline now or only seldomly online. If toolserver is offline please contact me via phone (short message will work too) - you can find the number in the wiki.
Please keep in mind that willow and nightshade are quite overloaded and try to eliminate as much load as you can on your tools.
Afais we have a lot of load from Iran - guys please keep an eye on that.
What Tim wrote about another admin is exactly right.
Cheers
Marlen/nosy
Hi,
for those of you not having seen TS-1553, mail forwarding
seems to have stopped working. So if you haven't received
the usual job reports that you were expecting, you might
want to login to all servers and check if there is mail for
you. You can query all servers by:
| for SERVER in clematis hawthorn nightshade ortelius willow wolfsbane yarrow; do
| ssh $USER(a)$SERVER.toolserver.org ls -l /var/mail/$USER
| done
replacing $USER with your username.
Tim