This week for our IRC bug triage, I decided to focus on problems
reported with caching. We focused on six bugs.
You can read the logs of the discussion:
http://hexm.de/54
The etherpad:
http://etherpad.wikimedia.org/BugTriage-2011-07
http://bugzilla.wikimedia.org/20468 — User::invalidateCache throws
1205: Lock wait timeout exceeded
These lock timeouts happen frequently enough that we can start to
track them down. A Tim said, to solve this: “We should reduce the
transaction time and number of locks in a transaction.”
Since these are showing up enough, we'll start to log the
backtrace, figure out where it is being called and add commit()
where necessary.
http://bugzilla.wikimedia.org/26338 — Wikimedia Javascript and CSS
files are getting an extra max-age cache-control param
This bug was filed back before ResourceLoader was deployed. After
Ryan confirmed that this was less of a problem now that it was
less of a problem now, he pointed to a couple of places that files
are still served without ResourceLoader that would benefit from
adding Apache directives.
http://bugzilla.wikimedia.org/26360 — Disabling sessions in memcached
produces open() error
Before we got to this one in triage Chad was already busy
investigating it. He thinks this was broken way back in r49370.
Under “You broke it you buy it”, he is fixing the problem.
http://bugzilla.wikimedia.org/29223 — Querying for rvdiffto=prev fails
for many revids: "notcached"
Sam has reportedly been working at this one and may have already
fixed it in trunk. I’ll check with him.
All was not lost in the discussion of this bug, though. It
reminded Tim that there is a similar problem with action=parse.
It only fetches from the parser cache, it doesn't store to it.
This problem reduces our parser cache hit ratio significantly
since we have a growing number of action=parse hits due to Android
and iPhone apps.
I filed a new bug to fix the problem Tim mentioned:
http://bugzilla.wikimedia.org/29907
http://bugzilla.wikimedia.org/29384 — Load order of request in IE6
messes with dependancy resolving (mediawiki.util not available in
time)
Krinkle has been looking into this one but doesn't yet know what
is causing it. Perhaps he and Trevor will have time to look at it
in this coming week when he is in San Francisco.
http://bugzilla.wikimedia.org/29552 — Squid cache of redirect pages
don't get purged when page it redirects to gets edited
Much of the discussion for this bug and the next one overlapped,
but Tim suggested that we should be seeing the same problems with
templatelinks as we are ssing with redirect pages.
Roan responded that he thought there frequently were problems with
templatelinks but that they were mis-attributed to the job queue
instead of squid problems.
http://bugzilla.wikimedia.org/28613 — Thumbnails of updated files fail
to purge on squids
There is lots of speculation as to *what* is causing these
problems. Initially, we thought the squid caching problem was a
symptom of a hardware issue that the new routers being installed
week would fix.
With the new routers in place, though, it became clear that this
wasn't simply a matter of faulty hardware. After some discussion,
we thought packet loss (perhaps because MediaWiki does not
throttle the UDP packets it sends) might be a cause. I filed a
ticket in RT (
http://rt.wikimedia.org/Ticket/Display.html?id=1174)
to get Ops to add listeners to the multicast group so that we
could see if there was any packet loss and, if so, where it was
coming from.
If it turns out that there is no packet loss (or other network
problems), then we'll have to look at MediaWiki itself.
Thanks to everyone's participation, I felt like this week's triage was
especially productive.
Till next week,
Mark.