Advice for a happy machine room
A lot of companies (and universities) invest a lot of money in a proper machine room with conditioned power, air conditioning, fire suppression systems and plenty of cable management for their central computers. This is an excellent idea as computers love air-conditioned rooms, and the improvement in failure rates and other troubles when they're not just being kept in a hot, dusty room is remarkable.
But where some places don't pay as much attention as they should is in the detail - you may have a nice machine room, but is it in a condition where you'll be easily able to work in there efficiently, particularly when disaster strikes and you're under pressure to restore service as soon as possible, maybe on your own late on a Sunday evening? Having considered this earlier today I'd like to make the following recommendations, mostly based on bitter experience, to anyone with system administration responsibilities - whether your machines live in a cupboard or in a nice big spacious machine room.
- The machine room is not a store room. Empty boxes, redundant kit dumped on the floor and any other extraneous junk are just things to get in the way when you need to work around them. Pay particular attention to this if you have a raised floor - if you're in a hurry, having to move a pile of junk PCs to lift the tile you need to lift to get access to a power feed right now isn't something you want to do. This can be a real problem if you have a generously sized machine room as people will often just dump stuff in there "so it's out of the way". If that happens, I suggest dumping stuff in their offices or on their desks, as that's "out of the way" as far as the machine room's concerned.
- Know your power. Whether it's just a single power outlet or multiple three-phase UPS's, it's in your own interest to get to know how power is distributed in your machine room and where all the relevant breakers, fuses and panic buttons are. A lot of panic buttons of the mushroom type are latching and need to be reset with a key - it's helpful to know this before spending ages checking power feeds when you can't repower the room for some reason. I'm not suggesting that you should be prepared to rewire your distribution board (although I do keep a set of insulated electrician's screwdrivers in the toolbox at work I'd only use them in the direst of emergencies), but do know which breakers to check first. It's also a good idea to know the routes that your telecomms circuits take out of the building.
- Tools are useful things. Every machine room should have a toolbox that's for the machine room - not for people to borrow stuff from if they're working elsewhere, not for taking home to fix your PC with, but for the machine room. Label it to make sure it's not removed, or at least so you can point at the label when people nick it. Believe me, you don't want to be screwing around (ahem) trying to find a screwdriver when you really, really need a screwdriver right now. Sure, people may have their own tools, but you'll probably find they're either locked in desk drawers, buried under piles of crap or just not there at all when you need them. Until recently I maintained loudly that the machine room toolbox should be locked with a combination padlock to stop things disappearing when you have contractors or service engineers in, but I've since come to the conclusion that locking the toolbox is just another thing to worry about. What if you can't remember the combination or can't find the key in an emergency? I'll be saying more about the things that should be in that toolbox in my next posting.
- Produce a startup/shutdown list. Unless all your stuff's on one or two servers, it's very unlikely that you can just turn on everything at once and have it all come up tickety-boo. In most environments, it's necessary to ensure that some services (DNS, essential file servers, LDAP, NIS, etc) are available before bringing things which rely on them online, and having a simple ticklist to make the right order to bring things up clear makes life much, much easier. I'd actually extend this and suggest that you create a "grab file" containing all the essential information - startup orders, power distribution, network data, contact phone numbers and anything else needed in an emergency.
- Building managers hate machine rooms. People who don't work in IT often have a habit of seeing computer rooms as expensive luxuries for those already-expensive people in IT. All that specialist air conditioning, trunking and fire suppression is expensive stuff, and dedicating a room to computers is increasingly seen as an unnecessary extravagance in the modern working environment where a million staff are being shoehorned into open plan offices. It's worth your while trying to disabuse estates people of this notion whenever possible - not only are they less likely to come sniffing round looking to reclaim space to shove yet more open plan desks into, but you'll also have a better working relationship for when you need their help in an emergency, or when the air con's getting unreliable and needs replacing. I say this as one who's lived through at least two hot summers with shaky machine room air conditioning.
I think that's about it. There might be a few other bits and pieces that spring to mind (let me know if you think I've missed anything obvious), but I'll be expanding on some of these points at a later date anyway.
Posted by mpk at May 15, 2004 1:52 PM
| TrackBack