Thanks Roan & Brad! We'll get back on track with wmf.1 deployments today :D
-Chad
On Wed, May 11, 2016 at 11:08 PM, Roan Kattouw rkattouw@wikimedia.org wrote:
TLDR: the bug is fixed and the errors have stopped.
I started working around this train hold by backporting the entire Echo extension from wmf1 to wmf23, assuming that the bug would be in MW core and updating Echo wouldn't affect it. Right after I deployed that, these errors started being thrown by wmf1 too.
It turned out that one of the Echo changes I backported stores the integer -1 in redis under some circumstances. RedisBagOStuff treats integers specially, in order to make incr() work: it stores them as plain numbers instead of PHP-serialized data. But when retrieving this value, the code didn't recognize -1 as a plain number because it didn't consist solely of digits ('-' is not a digit), so it thought it was PHP-serialized data and passed it to unserialize(), which caused the error. Apparently no one had ever tried to store a negative integer in redis (!) until my Echo change exposed the bug.
Brad did all the hard work, diagnosing this and writing up a fix on Phabricator. I turned that into a patch and deployed it about an hour ago. There haven't been any more errors since then.
On Wed, May 11, 2016 at 1:59 PM, Chad Horohoe chorohoe@wikimedia.org wrote:
Hi,
When we deployed the first 1.28 release to the cluster yesterday, we got a new error[0] relating to unserialization of redis data. It's pretty spammy already, so I'm paranoid about deploying wider until we figure out why. Deploying some debugging work soon so we can figure out what's going on.
If you've got any information you think would help, please chime in on the bug.
-Chad
[0] https://phabricator.wikimedia.org/T134923
Engineering mailing list Engineering@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/engineering
It looks like the train didn't roll forward today? I was going to swat out a new test that depends on 1.28.0-wmf.1 but 1.27.0-wmf.23 still looks to be running group2
On Thu, May 12, 2016 at 9:01 AM, Chad Horohoe chorohoe@wikimedia.org wrote:
Thanks Roan & Brad! We'll get back on track with wmf.1 deployments today :D
-Chad
On Wed, May 11, 2016 at 11:08 PM, Roan Kattouw rkattouw@wikimedia.org wrote:
TLDR: the bug is fixed and the errors have stopped.
I started working around this train hold by backporting the entire Echo extension from wmf1 to wmf23, assuming that the bug would be in MW core and updating Echo wouldn't affect it. Right after I deployed that, these errors started being thrown by wmf1 too.
It turned out that one of the Echo changes I backported stores the integer -1 in redis under some circumstances. RedisBagOStuff treats integers specially, in order to make incr() work: it stores them as plain numbers instead of PHP-serialized data. But when retrieving this value, the code didn't recognize -1 as a plain number because it didn't consist solely of digits ('-' is not a digit), so it thought it was PHP-serialized data and passed it to unserialize(), which caused the error. Apparently no one had ever tried to store a negative integer in redis (!) until my Echo change exposed the bug.
Brad did all the hard work, diagnosing this and writing up a fix on Phabricator. I turned that into a patch and deployed it about an hour ago. There haven't been any more errors since then.
On Wed, May 11, 2016 at 1:59 PM, Chad Horohoe chorohoe@wikimedia.org wrote:
Hi,
When we deployed the first 1.28 release to the cluster yesterday, we got a new error[0] relating to unserialization of redis data. It's pretty spammy already, so I'm paranoid about deploying wider until we figure out why. Deploying some debugging work soon so we can figure out what's going on.
If you've got any information you think would help, please chime in on the bug.
-Chad
[0] https://phabricator.wikimedia.org/T134923
Engineering mailing list Engineering@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/engineering
Engineering mailing list Engineering@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/engineering
On Thu, May 12, 2016 at 7:33 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
It looks like the train didn't roll forward today? I was going to swat out a new test that depends on 1.28.0-wmf.1 but 1.27.0-wmf.23 still looks to be running group2
For the record, it appears that group 2 was moved to 1.28.0-wmf.1 about an hour after Erik's message was sent.
On Fri, May 13, 2016 at 6:53 AM, Brad Jorsch (Anomie) <bjorsch@wikimedia.org
wrote:
On Thu, May 12, 2016 at 7:33 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
It looks like the train didn't roll forward today? I was going to swat out a new test that depends on 1.28.0-wmf.1 but 1.27.0-wmf.23 still looks to be running group2
For the record, it appears that group 2 was moved to 1.28.0-wmf.1 about an hour after Erik's message was sent.
Indeed. I rolled out group1 roughly during the window for group2. I waited a few hours (until after afternoon swat) and rolled out group2.
1.28.0-wmf.1 is everywhere now.
-Chad
wikitech-l@lists.wikimedia.org