Several users were worried about toolserver tools that stopped working without (to them) apparent reason. I have tried to explain in in non-tech-German [1] and it seems that helped.
I was then asked if these issues are communicated somewhere centrally. I found [2], which points to [3], which returns 404. It would probably be too technical anyway. status.toolserver.org would not help the average user either, even if it were always up-to-date.
So, should issues that affect the toolserver users (either the code-writing or the tool-using, or both) be mentioned somewhere? There is the toolserver blog [4], with the last entry from February 2010...
I think if all toolserver admins had write access to that blog, one of them could spare 5 minutes (if that) to update the general public about major issues or changes (changing stuff on the toolserver, even supposedly minor changes, tends to break stuff...).
Cheers, Magnus
[1] http://commons.wikimedia.org/wiki/Commons:Forum#Zentrale_Info_.C3.BCber_Tech... [2] http://meta.wikimedia.org/wiki/Toolserver/MaintenanceLog [3] https://jira.toolserver.org/display/tech/Maintenance+log [4] http://journal.toolserver.org/
Hello, At Wednesday 17 August 2011 14:42:49 DaB. wrote:
Several users were worried about toolserver tools that stopped working without (to them) apparent reason. I have tried to explain in in non-tech-German [1] and it seems that helped.
I was then asked if these issues are communicated somewhere centrally. I found [2], which points to [3], which returns 404. It would probably be too technical anyway. status.toolserver.org would not help the average user either, even if it were always up-to-date.
So, should issues that affect the toolserver users (either the code-writing or the tool-using, or both) be mentioned somewhere? There is the toolserver blog [4], with the last entry from February 2010...
I think if all toolserver admins had write access to that blog, one of them could spare 5 minutes (if that) to update the general public about major issues or changes (changing stuff on the toolserver, even supposedly minor changes, tends to break stuff...).
I guess we have to differ 2 cases: Emergency problems (like the outage tonight) and the rest. During emergencies all the roots have time for is to update status.toolserver.org (which was repaired last night) and maybe an eMail to announce. I think it is clear that we than have other things to do then writing long texts ;-). For all other thing I think a revive of the blog would be great (I first have to remember how to login there). What is not an option is to write something at all/most/few wikis – but you (the tool-programmers) can do what. You speak the languages and know where to write it (so thanks to you magnus that you informed commons).
Sincerly, DaB.
Cheers, Magnus
[1] http://commons.wikimedia.org/wiki/Commons:Forum#Zentrale_Info_.C3.BCber_Te chnikprobleme [2] http://meta.wikimedia.org/wiki/Toolserver/MaintenanceLog [3] https://jira.toolserver.org/display/tech/Maintenance+log [4] http://journal.toolserver.org/
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Just wondering...
Can't wikimedia add the toolserver to the status.wikimedia.org page? This will mean that there is no manual updating for statusses anymore.
2011/8/17 DaB. WP@daniel.baur4.info
Hello, At Wednesday 17 August 2011 14:42:49 DaB. wrote:
Several users were worried about toolserver tools that stopped working without (to them) apparent reason. I have tried to explain in in non-tech-German [1] and it seems that helped.
I was then asked if these issues are communicated somewhere centrally. I found [2], which points to [3], which returns 404. It would probably be too technical anyway. status.toolserver.org would not help the average user either, even if it were always up-to-date.
So, should issues that affect the toolserver users (either the code-writing or the tool-using, or both) be mentioned somewhere? There is the toolserver blog [4], with the last entry from February 2010...
I think if all toolserver admins had write access to that blog, one of them could spare 5 minutes (if that) to update the general public about major issues or changes (changing stuff on the toolserver, even supposedly minor changes, tends to break stuff...).
I guess we have to differ 2 cases: Emergency problems (like the outage tonight) and the rest. During emergencies all the roots have time for is to update status.toolserver.org (which was repaired last night) and maybe an eMail to announce. I think it is clear that we than have other things to do then writing long texts ;-). For all other thing I think a revive of the blog would be great (I first have to remember how to login there). What is not an option is to write something at all/most/few wikis – but you (the tool-programmers) can do what. You speak the languages and know where to write it (so thanks to you magnus that you informed commons).
Sincerly, DaB.
Cheers, Magnus
[1]
http://commons.wikimedia.org/wiki/Commons:Forum#Zentrale_Info_.C3.BCber_Te
chnikprobleme [2]
http://meta.wikimedia.org/wiki/Toolserver/MaintenanceLog
[3] https://jira.toolserver.org/display/tech/Maintenance+log [4] http://journal.toolserver.org/
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
-- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
announce. I think it is clear that we than have other things to do then writing long texts ;-).
No, that is not clear _at all_. I realize the importance of toolserver maintenance. But let's face it, you guys are providing the infrastructure FOR THE TOOLS. No enduser cares about the toolserver itself, people care about whether the tools are working or not. And a manytimes these tools break or need intervention when a part of the ts cluster breaks. The tool authors need to be informed so that they in turn can take action to fix their tools. This is important, and I think it warrants taking a few minutes to withe a notification.
Daniel Schwen lists@schwen.de wrote: announce. I think it is clear that we than have other things to do then writing long texts ;-).
No, that is not clear _at all_. I realize the importance of toolserver maintenance. But let's face it, you guys are providing the infrastructure FOR THE TOOLS.
What about using http://nagios.toolserver.org/ instead of the status webpage that has to be updated manually? Sure, it's not as readable as a plain text description, but maybe if hosts/services are described in a bit more verbose way it would work better?
//Saper
Hello, At Wednesday 17 August 2011 16:02:49 DaB. wrote:
No, that is not clear _at all_. I realize the importance of toolserver maintenance.
A maintaince is something you know it will happen BEFORE it happens. An outtage/crash/emergency is something you have no idea it will happen in the next minute.
Sincerly, DaB.
A maintaince is something you know it will happen BEFORE it happens. An outtage/crash/emergency is something you have no idea it will happen in the next minute.
Sure, I was a bit sloppy with the terminology, but the message still stands. Even in an emergency, please think about _priorities_. Five minutes taken to write a notification could save several tool developers hours of * figuring out what is happening * having long downtimes due to crashed processes, broken/inconsistent database tables * cleaning up after bots that break or go berserk due to server outages
Sure, you can start pointing fingers at the tool developers and yell "write better tools". But let me remind you (in case it was forgotten). The tool developers already donate their free time to create tools that benefit the wikimedia userbase. We are not doing this purely for our amusement. I'm grateful the toolserver exists, but I'm also doing my share of the work! All I'm asking is that the admins recognize that part of their service should be not making my work harder than it already is.
On Wed, Aug 17, 2011 at 10:20 AM, Daniel Schwen lists@schwen.de wrote:
Sure, I was a bit sloppy with the terminology, but the message still stands. Even in an emergency, please think about _priorities_. Five minutes taken to write a notification
I think this needs a little bit of clarifying.
A notification should be written if the downtime is going to be more extended or if you stop working on it for some amount of time (you can't do it, you go to bed, or something like that). If you're busy working on fixing it, that's your first priority and I don't think anyone wants you to stop prioritizing that. :-)
Wikimedia ops are the same way -- it's the sysadmins first priority to fix up the techie stuff, that's their job. If someone's not doing anything at the moment and has some free time, *then* they should make communicating about the issue a priority. In the meantime, anyone watching what's going on in #wikimedia-toolserver can share what they're seeing with confused endusers.
I think this needs a little bit of clarifying.
Ok, here we go:
Priority should _not_ be whether "the toolserver" is running, it should be "are the tools running"!
I thought this was a no-brainer. You may think these are the same thing, but it is not! Finxing a toolserver problem is the necessary _minimum_, but it is not _sufficient_ to assist in making sure the tools run. For that you need to keep the tool developers in the loop!
On Wed, Aug 17, 2011 at 09:44:41AM -0500, Daniel Schwen wrote:
I think this needs a little bit of clarifying.
Ok, here we go:
Priority should _not_ be whether "the toolserver" is running, it should be "are the tools running"!
I thought this was a no-brainer. You may think these are the same thing, but it is not! Finxing a toolserver problem is the necessary _minimum_, but it is not _sufficient_ to assist in making sure the tools run. For that you need to keep the tool developers in the loop!
Hi Daniel,
But does that really need to happen *during* the outage? I much more prefer to have toolserver up quickly. Of course shortly after the outage I do expect a mail what briefly happened, an then if needed tool-owner can check and act. And that is what normally has been happening as well.
A mail during the outage is appreciated, and mostly possible, but still fixing should be first priority.
Regards,
Andre
On Wed, Aug 17, 2011 at 3:33 PM, Casey Brown lists@caseybrown.org wrote:
On Wed, Aug 17, 2011 at 10:20 AM, Daniel Schwen lists@schwen.de wrote:
Sure, I was a bit sloppy with the terminology, but the message still stands. Even in an emergency, please think about _priorities_. Five minutes taken to write a notification
I think this needs a little bit of clarifying.
A notification should be written if the downtime is going to be more extended or if you stop working on it for some amount of time (you can't do it, you go to bed, or something like that). If you're busy working on fixing it, that's your first priority and I don't think anyone wants you to stop prioritizing that. :-)
100% agree.
Wikimedia ops are the same way -- it's the sysadmins first priority to fix up the techie stuff, that's their job. If someone's not doing anything at the moment and has some free time, *then* they should make communicating about the issue a priority. In the meantime, anyone watching what's going on in #wikimedia-toolserver can share what they're seeing with confused endusers.
Idea: How about some one-line message detailing current issues (if any) with the toolserver that I could transclude from a single location on top of my tools? That would alert all "my" users that there is a problem with the toolserver (and not my tool specifically), and that it's being worked on.
Magnus
Hello, At Wednesday 17 August 2011 16:57:08 DaB. wrote:
Idea: How about some one-line message detailing current issues (if any) with the toolserver that I could transclude from a single location on top of my tools? That would alert all "my" users that there is a problem with the toolserver (and not my tool specifically), and that it's being worked on.
that's what the status-files are for (see the subthread by Platonides from 8. August). But yesterday they wouldn't had helped because the webpage were not working too (but your tools also not, so unavaible status- files were the minor problem).
Sincerly, DaB.
toolserver-l@lists.wikimedia.org