2009/2/16 Aryeh Gregor Simetrical+wikilist@gmail.com:
On Mon, Feb 16, 2009 at 9:18 AM, Tim Starling tstarling@wikimedia.org wrote:
I've deleted all the slow refreshLinks2 jobs which have apparently been preventing the job queue from making any headway for the last few months. Some people report that they have received hundreds of edit notification emails in the last few hours, due to the months of backlog now being cleared.
So are there no alarm bells that go off when the job queue is unreasonably long, or do people just not listen to them? Perhaps we could have a bot in #wikimedia-tech that would complain every hour if the oldest job in the queue is more than X days old?
Alternatively, the number of jobs processed per request could be made a function of the length of the backlog (in terms of time) - the longer the backlog is, the faster we process jobs. Then if the job queue get to being months behind we would all notice it because everything would start running really slowly. (Obviously, the length of the job queue needs to be added to whatever diagnostic screen the devs first check when the site slows down, otherwise it won't help much.)