Hello all,
around 3 o'clock UTC we lost connection to amaranth, our server in Tampa which handles the connection to the WMF-database-servers. Until now it is unclear if it is a server-problem or a connection-problem. I have tried to reach the wmf- techs, but no response yet. I will keep you updated by mail, because JIRA is also hosted at amaranth and so also down.
Sincerely, DaB.
Hi Daniel,
Did some poking. Thanks to Andrew Otto and Chris Johnson the server is back online (hard reboot). It looks like it has some faulty hardware: http://nagios.toolserver.org/cgi-bin/status.cgi?host=amaranth
Can you restart the replication?
Maarten
Op 7-2-2013 15:25, DaB. schreef:
Hello all,
around 3 o'clock UTC we lost connection to amaranth, our server in Tampa which handles the connection to the WMF-database-servers. Until now it is unclear if it is a server-problem or a connection-problem. I have tried to reach the wmf- techs, but no response yet. I will keep you updated by mail, because JIRA is also hosted at amaranth and so also down.
Sincerely, DaB.
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Hello, At Thursday 07 February 2013 23:15:23 DaB. wrote:
Hi Daniel,
Did some poking.
Thanks for that.
Thanks to Andrew Otto and Chris Johnson the server is back online (hard reboot).
Yes, it is online and working again :-). Thanks to everyone involved!
It looks like it has some faulty hardware: http://nagios.toolserver.org/cgi-bin/status.cgi?host=amaranth
Yes, the order for replacement got lost in the WMDE office, but they work on it now.
Can you restart the replication?
Done with the exception of wikidata and commons on cassia. I will look for them now.
Maarten
Sincerely, DaB.
Hello!
It looks like it has some faulty hardware: http://nagios.toolserver.org/cgi-bin/status.cgi?host=amaranth
Yes, the order for replacement got lost in the WMDE office, but they work on it now.
Well no - WMDE *did* inform WMF about this issue several times last year. It's hard for me to follow-up why the replacement didn't happen then and to me, it is more important to make it happen now.
(And when I took over the coordination I wasn't aware that this still needs a follow-up because I thought Tampa was history which is obviously wrong.)
Best,
Hello, At Friday 08 February 2013 14:18:03 DaB. wrote:
Well no - WMDE did inform WMF about this issue several times last year. It's hard for me to follow-up why the replacement didn't happen then and to me, it is more important to make it happen now.
ok, I tried to phrase it diplomatically. The truth would be that WMDE was not able to replace the broken hardware in >5 months, although I told them several times that it is broken and it is important. It doesn't really matter if WMDE just not cared, was busy or the WMF did not respond.
(And when I took over the coordination I wasn't aware that this still needs a follow-up because I thought Tampa was history which is obviously wrong.)
AFAIK you are a sys-op yourself. So all you needed to do was to look into our nagios, which would had told you that a.) amaranth is still active and b.) the hardware is still broken in it. Or you could have ask Nosy or me; or Sebastian.
Sincerely, DaB.
toolserver-l@lists.wikimedia.org