On 28/12/2009 08:13 PM, Mike.lifeguard wrote:
Which admins? I know River is very experienced with Solaris - but are the other sysadmins more experienced with Solaris than linux as well?
We had a long discussion about this in the admins IRC channel... we all agree that having two separate OSs in the platform is bad, and that either everything should run Linux, or everything should run Solaris. So the only decision is which OS we should use.
(A third option would be to ditch both Linux and Solaris and move to another platform entirely... while I'm always looking for alternative solutions to consider, this is very unlikely to happen any time soon, so it's irrelevant to this discussion.)
From my point of view, there are three distinct admin tasks which are regularly needed for the TS. The first is general housekeeping; monitoring servers, killing errant processes, and so on. There is little difference between Linux and Solaris here, so we can ignore that.
The second is software installation. Debian has a large pre-built repository of software, while for Solaris we maintain our own packages, which means new software has to be built and packaged for installation. (Which is usually fairly simple, but takes a bit longer than running "apt-get".)
I don't see this as a particularly critical issue. Even if I end up having to build all the software myself, it won't add a significant amount of work to my existing workload.
The third issue is more intricate maintenance and debugging work needed to keep the system running. This includes things like network installation profiles to allow reproducable server configurations, through performance tuning, OS patching, etc.; and debugging: MySQL, for example, is a complicated piece of software with many subtle interactions with the OS, and maintaining it requires good knowledge of both the OS and MySQL itself.
At the moment, the vast majority of this sort of work is done by me, which works because I have a good understanding of, and real-world experience with, the operating system we use (Solaris), and also experience using Solaris with the other software we use. On the other hand, I rarely use Linux, and have no experience of using Linux in complex systems like the Toolserver, or with MySQL.
Obviously I can install a Linux system and run MySQL on it, but the moment we run into some subtle operating system issue interaction or performance issue with MySQL, it's going to take me a lot longer to fix it than it would if we ran Solaris. It therefore seems to me that if we moved to Linux, the overall reliability of the Toolserver platform would be degraded.
The obvious counter-point here is that if all systems ran Linux, even if I wasn't able to fix it, we would have a larger pool of admins with Linux experience who could. However, most of our admins have little time to spend on the Toolserver, and indeed some admins we've added in the past have effectively disappeared due to other commitments. Of the three currently active administrators other than myself, one is self-admittedly not an expert on either platform, and one has little time to spend on the Toolserver other than the work he already does, so that leaves one person with Linux experience. Moving most of the workload from one person to another person doesn't seem like any advantage at all.
One or two other people have offered their time to the Toolserver in the past, and have some amount of Linux experience, but given our experience with adding additional admins so far, I'm reluctant to keep adding people who disappear or only have time to perform housekeeping tasks; I'd rather wait for someone who can clearly commit a significant amount of time to the project.
Then again, what happens if a bus comes along and we then have admins who know linux best?
This is a valid concern, but if we used Linux, and our single active Linux admin was hit by a bus, we'd have the same problem. I hope there will be some changes over the next year or two which will make this issue moot... until then, I will try to avoid busses.
- river.