Roan Kattouw wrote:
On Tue, Mar 20, 2012 at 7:12 AM, MZMcBride z@mzmcbride.com wrote:
Some people have suggested that the current API behavior is intentional. That is, that having different servers return the same error code is better than having different servers return an error code or not. I think this is flawed logic due to the problems that it presents (scripts unable to get past the error code), but it's definitely something that needs investigation for the future.
You're confusing Apache servers and database servers. You're hitting different Apache servers at random, those are the srv numbers. Each Apache requests the current lag from all the DB servers serving that wiki, computes the maximum (in this case db36 wins easily) and checks that against your maxlag parameter. Obviously the highest lag is greater than your maxlag setting, so it throws an error and bails.
Had it continued, or had you not set a maxlag parameter, you would have hit a DB server at random (except that it wouldn't have let you hit db36 because it was lagged to hell and back). So yeah /normally/ you hit DB servers at random and different servers might respond differently (or be lagged to different degrees), but in this particular case it was always the same DB server returning the same lag value. Nothing strange going on here, this is how the maxlag parameter works.
Thank you for this explanation. It was very helpful.
I had some follow-up questions about what to do to mitigate this issue going forward (particularly server-side), but you and others are already brainstorming in this thread, so I'll slink back to my quiet world of client-side scripting. :-)
(And thanks again to Asher for depooling the lagged server, even as a temporary fix. I was able to do about 2500 talk page deliveries this morning without any issue.)
MZMcBride