-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Tim Starling wrote:
I did see something like this before, and the reason I
didn't revert the
ES changes is because they weren't the issue, and the fact that ES master
went down first allowed the site to continue in read-only mode. You could
have just increased the max connections on the ES masters, for the same
effect. The connection count on the core master would have overflowed instead.
But I did think I had found the root cause of the problem at the time,
obviously I hadn't.
Doing the revert totally changed the performance characteristics of the
site, moving it from sitting around timing out to *being* readable.
I'm not sure what part was the problem, but something was definitely
wrong...
I think the ES load balancing changes were useful, and
are a good way to
progress towards higher availability. I think a better way to fix the
site_stats contention would have been to insert an unconditional COMMIT in
SiteStatsUpdate::doUpdate().
Well, my main concern there is that if operations are weirdly ordered
you can end up with a total "transaction" half-committed... on the other
hand, these are done in deferred updates. They're in theory meant to be
something that won't kill ya if it fails, otherwise they'd have been...
not... deferred.
Either we need to rethink the old deferred updates system entirely and
turn them into immediate applications, or we should make them operate as
separate transactions (and potentially restartable in case they
separately get rolled back or deadlocked).
If the connection count on the ES master really is a
problem (not just a
symptom of a much larger problem), then that can be mitigated by closing
the connections early. But I think the only reason we're seeing this come
out on the ES servers is because they have the lowest number of maximum
connections, so they fail first.
It's probably easier to just bump the connection limits on ES to match
or exceed the core DBs. The actual activity should never be very
expensive, so a sleeping connection won't hurt much.
- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iEYEARECAAYFAkjbqm4ACgkQwRnhpk1wk44hTgCguADKzRCv4ygFeFk4x9nMRE5S
YiYAnj9h2mTFVXnT718Krca8Ptv3UmTK
=rels
-----END PGP SIGNATURE-----