On Tue, Mar 20, 2012 at 7:12 AM, MZMcBride
<z(a)mzmcbride.com> wrote:
Some people have suggested that the current API
behavior is intentional.
That is, that having different servers return the same error code is better
than having different servers return an error code or not. I think this is
flawed logic due to the problems that it presents (scripts unable to get
past the error code), but it's definitely something that needs investigation
for the future.
You're confusing Apache servers and database servers. You're hitting
different Apache servers at random, those are the srv numbers. Each
Apache requests the current lag from all the DB servers serving that
wiki, computes the maximum (in this case db36 wins easily) and checks
that against your maxlag parameter. Obviously the highest lag is
greater than your maxlag setting, so it throws an error and bails.
Had it continued, or had you not set a maxlag parameter, you would
have hit a DB server at random (except that it wouldn't have let you
hit db36 because it was lagged to hell and back). So yeah /normally/
you hit DB servers at random and different servers might respond
differently (or be lagged to different degrees), but in this
particular case it was always the same DB server returning the same
lag value. Nothing strange going on here, this is how the maxlag
parameter works.
Thank you for this explanation. It was very helpful.
I had some follow-up questions about what to do to mitigate this issue going
forward (particularly server-side), but you and others are already
brainstorming in this thread, so I'll slink back to my quiet world of
client-side scripting. :-)
(And thanks again to Asher for depooling the lagged server, even as a
temporary fix. I was able to do about 2500 talk page deliveries this morning
without any issue.)
MZMcBride