Re: [Wikitech-l] Page saving slowness and some loading breakage today

25 Sep 2008

Brion Vibber wrote:
...
  -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1

 Posted this summary on blog, going out to en.planet.wikimedia.org...
 http://leuksman.com/log/2008/09/24/why-is-everything-broken-this-week/

 We’ve tracked down today’s problems to a combination of a couple of things:

    1. There’ve been ongoing database locking issues with the site
 statistics updates — these would all block on each other, making page
 saves very slow at times
    2. … which held open database connections, causing the text storage
 servers to start locking out new connections …
    3. … which exacerbated problems with the failover behavior of recent
 changes to the storage and load balancing code. 
I did see something like this before, and the reason I didn't revert the
ES changes is because they weren't the issue, and the fact that ES master
went down first allowed the site to continue in read-only mode. You could
have just increased the max connections on the ES masters, for the same
effect. The connection count on the core master would have overflowed instead.

But I did think I had found the root cause of the problem at the time,
obviously I hadn't.

I think the ES load balancing changes were useful, and are a good way to
progress towards higher availability. I think a better way to fix the
site_stats contention would have been to insert an unconditional COMMIT in
SiteStatsUpdate::doUpdate().

If the connection count on the ES master really is a problem (not just a
symptom of a much larger problem), then that can be mitigated by closing
the connections early. But I think the only reason we're seeing this come
out on the ES servers is because they have the lowest number of maximum
connections, so they fail first.

-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Page saving slowness and some loading breakage today