Hi all
The toolserver is currently bogged down, again, by several massive, long running queries by kmartin and edwardspec (guys, think what you are doing before launching a query - and if you kill one, make sure you *really* kill it).
As often, there's no root user online to take care of the issue (rob, please come back...). So, I have been thinking:
How about giving all users (or at least a group of "senior" users) a way to kill long running queries (if not whitelisted)? It has been policy for some time now that anything running more than an hour unannounced can be killed - so why not allow more people to do it?
I think this could be implemented as a stored procedure - or two, actually. One for listing queries that are eligible for killing (long running, not whitelisted), and one for actually killing it per ID. Would that be possible permission-wise? I.e. elevate permissions in a stored procedure? Alternatively, this could be invoked from the command line using a small executable with setuid flag.
Any such action should be logged, of course, and ideally an email should be sent to the affected user automatically. So, what do you think?
Alternatively, we would have to make sure to have a root around 24/7 - or perhaps install an automated email alert if replag rises above an hour or so? Can you think of a smart way to resolve the issue?
-- Daniel
PS: I'm aware that long running queries are not the only thing that may bog down zedler. But it's something that happens frequently, and is quite easy to detect. So I propose to start there. The next step would be memory hogs, i guess.
Hello, Am Sonntag, den 16.07.2006, 13:54 +0200 schrieb Daniel Kinzler:
Hi all
The toolserver is currently bogged down, again, by several massive, long running queries by kmartin and edwardspec (guys, think what you are doing before launching a query - and if you kill one, make sure you *really* kill it).
nod.
As often, there's no root user online to take care of the issue (rob, please come back...). So, I have been thinking:
I have kill them now. Sorry, but a few hours a day, I have to sleep. AFAIR have you and leon my handynumber and rivers number is in the MOTD-Message. So if it's realy urgend, you can call at least 2 of us.
[...]
-- Daniel
Sincerly, DaB.
As often, there's no root user online to take care of the issue (rob, please come back...). So, I have been thinking:
I have kill them now. Sorry, but a few hours a day, I have to sleep. AFAIR have you and leon my handynumber and rivers number is in the MOTD-Message. So if it's realy urgend, you can call at least 2 of us.
This wasn't meant as a complaint - especially not to against you, who's most active on IRC of all the roots right now. And I didn't think it urgent enough to call you or river. It's just annoying as hell, and it would be nice to have a way to solve it on our selfs.
-- Daniel
Hi
How about giving all users (or at least a group of "senior" users) a way to kill long running queries (if not whitelisted)? It has been policy for some time now that anything running more than an hour unannounced can be killed - so why not allow more people to do it?
This is a good idea, but then this could be done automatically too.
It think on something like a cronjob that searches for long running queries every 15 minutes, sends an email to the user if the query was running longer than 30 minutes and kills them after two hours.
BTW, is it possible to give the replication process a higher priority within the database server?
regards, Aka
Hello,
I have kill again in sum 10 task from edward, which runs over 1h (at least).
@edward: Please fix your scripts very soon.
Sincerly, DaB.
toolserver-l@lists.wikimedia.org