Hello all,
like announced we have set up a temporary workaround for the user-store until
hemlock is fixed. It should work now, the only exception is willow. I can't
mount or umount the user-store on willow and the directory looks strange. So I
hereby announce a reboot of willow for tomorrow,
Friday, 16:00 UTC.
The downtime should be 10 minutes. You can follow the process at [1].
[1] https://jira.toolserver.org/browse/MNT-1272
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Hello all,
hemlock had another disc-problem and needed a emergency-reboot. It is back in
action now (more or less) but it lost one of its partitions. This partition
carries the user-store. A check-up (fsck) is running at the moment and it will
take a while to complete (I guess the hole european night). Most things should
work, but nothing that needs /mnt/user-store.
I will send an update when the user-store is back.
Sincerely,
DaB.
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
---------- Weitergeleitete Nachricht ----------
Betreff: Re: [Toolserver-l] /mnt/user-store is away
Datum: Mittwoch 03 Oktober 2012
Von: Marlen Caemmerer <marlen.caemmerer(a)wikimedia.de>
An: Wikimedia Toolserver <toolserver-l(a)lists.wikimedia.org>
Hello,
DaB added the partition to nfs after fsck so the user-store is available since
3:45 UTC .
Cheers
nosy
_______________________________________________
Toolserver-l mailing list (Toolserver-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list:
https://wiki.toolserver.org/view/Mailing_list_etiquette
-------------------------------------------------------------
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Hello all,
like announced on last Sunday I hereby announce a maintenance-window for
Monday, 20:00-22:00 UTC for the web-servers.
I will reboot hemlock a few times to try to find out why the web-servers are
not working if hemlock is away (and if I find it, I will fix it). All web-tools
will failing in times when hemlock is (re-)booting, other sub-systems (like
SGE) should working normal.
Sincerely,
DaB.
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Hello,
as tomorrow is maintenance window anyway I will add more disk space to s2 and s5.
In the time of the work the databases s2 and s5 will not be available.
This will take about 1-1.5 hours and I will do it when DaB checks the hemlock & web server interaction
at 20 - 22 UTC.
Cheers
nosy
Hello all,
because of a kernel-upgrade I have to reboot our linux-boxes (nightshade,
yarrow and mayapple). This will happen tomorrow,
Monday, 19:05 UTC.
I will reboot the boxes one after the other, each reboot should not take more
than 10 minutes. If you use SGE (like you should) your task will either
migrate to another box or restarted automatically. If you have files open (like
in a editor), you should close them.
You can follow the process at [1].
Sincerely,
DaB.
[1] https://jira.toolserver.org/browse/MNT-1268
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Hello all,
to have something non-meta: I restarted mysql on z-dat-s3-a to de-swap
hyacinth. sql-s3 was away for 1.5h because the shutdown was very slow.
Sincerely,
DaB.
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Hello,
At Sunday 23 September 2012 21:01:27 DaB. wrote:
> Hello,
>
> At Sunday 23 September 2012 20:30:29 DaB. wrote:
> > Since about an hour the web servers appear to be unresponsive:
> >
> > * http://ortelius.toolserver.org/~cvn/index.html
> > * http://wolfsbane.toolserver.org/~cvn/index.html
> > * https://toolserver.org/~cvn/index.html
> >
> > All error out on with no response and a time out.
> >
> > I can still SSH into wolfsbane and ortelius from willow, though.
>
> I will now investigate this. Until now the only problem I found is that
> hemlock is down.
I restored the web-access now. As far as I see hemlock lost its external array
and became out of memory around 2:30 UTC. I have no idea why this influence our
webserver. I rebooted hemlock to free the memory and restarted the webserver
on ortelius and wolfsbane; the webpages are back AFAIS.
What is not working at the moment is the user-store and our backup, because
both are on the external array of hemlock. Also not working is munin, which is
handled by hemlock. I will try to fix all this, but I guess I need nosy for
that (and in the worst case Mark in the colo).
>
> Sincerely,
> DaB.
Sincerely,
DaB.
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
---------- Weitergeleitete Nachricht ----------
Betreff: Re: [Toolserver-l] z-dat-s4-a (s4-user) is down (was: Re: Reboot of
hyacinth s3/s6/s7)
Datum: Mittwoch 19 September 2012
Von: Marlen Caemmerer <marlen.caemmerer(a)wikimedia.de>
An: Wikimedia Toolserver <toolserver-l(a)lists.wikimedia.org>
Hello,
I had a bad accident with resizing the volume for s4-user.
Unfortunatelly I did not realize s4-rr does not hold the s4-user-databases
already.
I installed the backup of the user databases in this instance so s4-user
should be usable again.
Cheers
nosy
_______________________________________________
Toolserver-l mailing list (Toolserver-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list:
https://wiki.toolserver.org/view/Mailing_list_etiquette
-------------------------------------------------------------
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Hello all,
Nosy rebooted hyacinth this morning (see below). AFAIS something went wrong
with the sql-partition of s4, but I have no details yet. I have to speak with
Nosy first; until than sql-s4-user is down.
sql-s4-rr is operating normal.
Sincerely,
DaB.
At Tuesday 18 September 2012 16:16:51 DaB. wrote:
> Hello,
>
> I will reboot the database server hyacinth which holds s3, s6 and s7,
> tomorrow at 6:30 UTC.
>
> Cheers
> nosy
>
>
> _______________________________________________
> Toolserver-l mailing list (Toolserver-l(a)lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
> Posting guidelines for this list:
> https://wiki.toolserver.org/view/Mailing_list_etiquette
--
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885