Toolserver-l December 2012

toolserver-l@lists.wikimedia.org

32 participants
19 discussions

JIRA session loss
by Krinkle 17 Dec '12

17 Dec '12

Hi, I'm trying go through some issues on JIRA and it keeps logging me out every few minutes. At some point I even logged in, clicked an issue, clicked Edit (which uses AJAX) and then the Edit screen wouldn't load due to me not being authenticated (while I still saw my nickname on the top right). -- Krinkle

9 12

MySQL: explain and show are not working
by DaB. 16 Dec '12

16 Dec '12

Hello all, Dispenser messaged me because the query explain select * from enwiki_p.revision limit 1; isn't working anymore. As far as I see that's caused by the recent mysql- update. We need to patch this but it may take a few days and another mysql- restart (will be announced separately). You can follow the progress at [1]. Just to let you known. Sincerely, DaB. [1] https://jira.toolserver.org/browse/TS-1585 -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

1 0

Solaris system updates on Sunday
by Marlen Caemmerer 12 Dec '12

12 Dec '12

Hello, we will update the Solaris systems between 8pm and 10pm UTC on Sunday. The servers (ortelius, willow, clematis, hawthorn, wolfsbane, damiana, turnera, ptolemy) may also be rebooted if recommended by the update. Kind regards Marlen / nosy

1 0

MySQL Updatte Maintenance on Friday
by Marlen Caemmerer 12 Dec '12

12 Dec '12

Hello, MySQL needs a security update and we had none anyway for a while. This means we will update MySQL on Friday 8pm - 10 pm UTC. During this time the databases will restart at least once. Kind regards Marlen/nosy

1 0

Maintenance: Move of s6
by Marlen Caemmerer 09 Dec '12

09 Dec '12

Hello, I will move the s6 database instance to a SAN volume on 9th Dec 9pm UTC - 11pm UTC It is quite well possible that the move vill take an hour or probably more. Kind regards Marlen

1 1

Defective database s7
by Marlen Caemmerer 06 Dec '12

06 Dec '12

Hello, as some of you might have noticed s7 is badly corrupted. Before I go into the details I can say we will most likely have to resetup s7 due to a innodb failure and I need to find a workaround until we have the data in place which can take days or even weeks. Now the details - probably some of you with database knowledge might have an idea or say something to my idea of a workaround. The replication failed several times in the past days and I did not know why. I simply skipped the slave query when it failed and then replication ran again. Today the database process restarted without a slave query failing repeatedly. I had a close look and came to the idea that a broken transaction in the transaction log made it break. So I stopped the transaction from being played into the mysql db when the database restarts by setting innodb_force_recovery = 3. Ok, fine. MySQL then starts. Cool. But the slave process wont run in this mode so we dont have the new data. Hm. So I tried to throw away the broken transaction. I moved the iblog-files and started mysql again. MySQL failed to come up then telling me: 121116 11:40:28 InnoDB: Error: page 7 log sequence number 270 492619208 InnoDB: is in the future! Current system log sequence number 268 2967383564. InnoDB: Your database may be corrupt or you may have copied the InnoDB InnoDB: tablespace but not the InnoDB log files. See InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-innodb-recovery.html Ok. MySQL does not come up then and repeatedly restarts. No luck. Copied the log files back. Fine. Works again. Now I tried several thing to check which table might be corrupted. Innodbchecksum reported everything fine. Mysqlcheck crashed the mysql daemon when accessing centralauth.localnames. Oh? Why? Checking the table again crashes mysql. Hm. Tried a repair table - "storage engine does not support this"...hm. The log says InnoDB: Page lsn 268 3672100478, low 4 bytes of lsn at page end 3672100478 InnoDB: Page number (if stored to page already) 192520, InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 428 InnoDB: Page may be an index page where index id is 0 1174 InnoDB: (index "PRIMARY" of table "centralauth"."localnames") InnoDB: Error in page 192520 of index "PRIMARY" of table "centralauth"."localnames" 121116 13:02:46 - mysqld got signal 11 ; I tried to remove the index to rebuild it but this does not work due to innodb_force_recovery = 3. Mysqldump fails - crashes the mysql daemon too. So I dont have any more idea how to fix this error. Now I thought if we have to resetup I could drop the table completely and start mysql normal mode so replication works again. This would only mean s7 would lack this table until it is resetup. What do you think about this? Any more ideas? Cheers Marlen/nosy

4 7

SGE queues stalled
by Morten Wang 05 Dec '12

05 Dec '12

I've noticed that one of SuggestBot's hourly jobs has stalled for the past 7 hours, stuck in the "qw" state. Usually it runs like clockwork. Is there a problem with the SGE queues? Regards, Morten

3 5

Result of the general member meeting of WMDE
by DaB. 04 Dec '12

04 Dec '12

Hello all, I just got back from the general member meeting of Wikimedia Deutschland. As you know I requested a decision about the future of the toolserver there. To make it short: It doesn't went as well as I hoped. While the request itself was accepted, it was changed in some important parts. The main fear was that WMF could stop to provide us with fresh dumps and/or replication in near future, making the toolserver more or less useless. Although I learned from a participating WMF-board-member that no such board- decision exists. My request was changed in the following way: The WMF has to tell WMDE within 6 months how Wikilabs can replace the toolserver in the promised complete way. If the answer is not satisfying, WMDE will develop a "Governance-Model" to ensure the continuation of the toolserver. Different groups are invited into this "Governance-Model" and it should be done until the end of 2013. That sounds good on the first view, but there are 2 loop-holes: Nobody defined what "complete" or "satisfying" is. In my eyes Wikilabs can not replace the toolserver complete (in the way that all tools can move to there) and so the answer can only be unsatisfying, but that's just a question of definition I guess. A second change was that the investment for the toolserver will be restricted to the "necessary". While that is of course a matter of definition again I'm sure that means "no new hardware if it is possible in any way". To summarize this: In the best case we have to wait for 6 months until WMDE officially learns that Wikilabs can not replace us, than wait for another 6 months until they will create their "Governance-Model" and in 2014 we get new hardware. In worst case we wait for 6 months and than WMDE and WMF agree that everything is ok and we will never get any new hardware and somewhen the TS will shut down (of course with the remaining tools that can not migrated to Wikilabs). I can not imagine ways between both cases, but I'm sure they exists. In any way we will get no (or nearly no) new hardware in 2013 – so we have to life with that. A good news is that the toolserver will get 3 new database-servers soon. I have not decided yet if I will remain as root under this circumstances for 2013 – I will tell you my decision until next Sunday. For now I will head to bed because I'm exhausted and disappointed. See you tomorrow. Sincerely, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

6 7

S1
by John 03 Dec '12

03 Dec '12

do we have an updated ETA on a non-corrupt s1?

3 5

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Toolserver-l December 2012