Replication stopped

List overview All Threads
Download

newer

older

Split of s2/s5 (cassia)

Adding more roots

DaB.

7 Feb 2013 7 Feb '13

11:25 a.m.

Hello all,

around 3 o'clock UTC we lost connection to amaranth, our server in Tampa which handles the connection to the WMF-database-servers. Until now it is unclear if it is a server-problem or a connection-problem. I have tried to reach the wmf- techs, but no response yet. I will keep you updated by mail, because JIRA is also hosted at amaranth and so also down.

Sincerely, DaB.

-- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

Attachments:

signature.asc (application/pgp-signature — 198 bytes)

Show replies by date

Maarten Dammers

7 Feb 7 Feb

7:04 p.m.

Hi Daniel,

Did some poking. Thanks to Andrew Otto and Chris Johnson the server is back online (hard reboot). It looks like it has some faulty hardware: http://nagios.toolserver.org/cgi-bin/status.cgi?host=amaranth

Can you restart the replication?

Maarten

Op 7-2-2013 15:25, DaB. schreef:

...

Hello all,

around 3 o'clock UTC we lost connection to amaranth, our server in Tampa which handles the connection to the WMF-database-servers. Until now it is unclear if it is a server-problem or a connection-problem. I have tried to reach the wmf- techs, but no response yet. I will keep you updated by mail, because JIRA is also hosted at amaranth and so also down.

Sincerely, DaB.

Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

DaB.

7:18 p.m.

Hello, At Thursday 07 February 2013 23:15:23 DaB. wrote:

...

Hi Daniel,

Did some poking.

Thanks for that.

...

Thanks to Andrew Otto and Chris Johnson the server is back online (hard reboot).

Yes, it is online and working again :-). Thanks to everyone involved!

...

It looks like it has some faulty hardware: http://nagios.toolserver.org/cgi-bin/status.cgi?host=amaranth

Yes, the order for replacement got lost in the WMDE office, but they work on it now.

...

Can you restart the replication?

Done with the exception of wikidata and commons on cassia. I will look for them now.

...

Maarten

Sincerely, DaB.

-- Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885

Silke Meyer

8 Feb 8 Feb

9:13 a.m.

Hello!

...

...
It looks like it has some faulty hardware: http://nagios.toolserver.org/cgi-bin/status.cgi?host=amaranth

Yes, the order for replacement got lost in the WMDE office, but they work on it now.

Well no - WMDE *did* inform WMF about this issue several times last year. It's hard for me to follow-up why the replacement didn't happen then and to me, it is more important to make it happen now.

(And when I took over the coordination I wasn't aware that this still needs a follow-up because I thought Tampa was history which is obviously wrong.)

Best,

-- Silke Meyer Systemadministratorin und Projektassistenz Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. (030) 219 158 260 http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

DaB.

10:39 a.m.

Hello, At Friday 08 February 2013 14:18:03 DaB. wrote:

...

Well no - WMDE did inform WMF about this issue several times last year. It's hard for me to follow-up why the replacement didn't happen then and to me, it is more important to make it happen now.

ok, I tried to phrase it diplomatically. The truth would be that WMDE was not able to replace the broken hardware in >5 months, although I told them several times that it is broken and it is important. It doesn't really matter if WMDE just not cared, was busy or the WMF did not respond.

...

(And when I took over the coordination I wasn't aware that this still needs a follow-up because I thought Tampa was history which is obviously wrong.)

AFAIK you are a sys-op yourself. So all you needed to do was to look into our nagios, which would had told you that a.) amaranth is still active and b.) the hardware is still broken in it. Or you could have ask Nosy or me; or Sebastian.

Sincerely, DaB.

-- Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885

4346

Age (days ago)

4347

Last active (days ago)

toolserver-l@lists.wikimedia.org

4 comments

3 participants

tags (0)

participants (3)

DaB.
Maarten Dammers
Silke Meyer