February 20, 2004

It's uptime, stupid!

During a discussion on performance metrics for system administrators on a SAGE mailing list earlier today, someone reported that at one of their workplaces, the lead system administrator had the words "IT'S UPTIME, STUPID" written on their whiteboard.

It occured to me that this is one of the most profound statements that can be made about the business of system administration. You can have all the bells and whistles you want and implement every feature that the management demand, but ultimately, if implementing new things is likely to make your service less robust, it should be resisted. There's no point in having a system with three times the features if it's down for ten times the amount of time that the less featureful system is down for. Part of the reason users are so cynical about computers "going down all the time" (even if they don't) is that demands for feature implementations are too often allowed to overshadow the primary design consideration, which is that the system should be stable, reliable, and available when people expect it to be there.

(An interesting point about system availability metrics is that they almost always look good to those who don't look at them very hard. 99% availability is a piece of cake to achieve as although the number looks good, that's seven and a half hours a month of downtime. 99.9% is pretty easy too (that's about 45 minutes downtime per month, which when you average it out over a year is pretty achievable), but it's only when the figures get better than that that there are really grounds to be smug - it's a game of decimal places, this one.)

Posted by mpk at February 20, 2004 11:01 PM | TrackBack
Comments
Post a comment









Remember personal info?