Cleaning Up an IT Mess

So you have a shiny new job managing the IT infrastructure for Acme Widgets Inc, a company that has been in business for decades. You have barely sat down at your desk when the telephone rings. You pick it up but before you can utter a canned greeting, you find yourself being berated for some random failure of something you haven’t had time to even learn about. You shrug and dive in and eventually manufacture a solution to the problem. No sooner have you done that than the telephone rings again with another irate user. And so passes your first day. And your second. And your third. And your thirtieth. And your enthusiasm.

Eventually, you pitch management on hiring an assistant. They grudgingly accept that an assistant is needed simply to handle things when you take your annual legally required vacation, but instead of giving you a full time assistant, they tell you to train one of the in house software development grunts. You choose which one. You leave the board room dejected. How are you supposed to get anything beyond firefighting done without a properly trained assistant?

Sullenly, you set about documenting every problem that occurs along with the solution you employ. You make notes about causes, underlying problems with the systems in use, and so on. It takes more time than you have but you know the only way your part time assistant has any hope of handling anything during your vacation (which you are not eligible for until next year some time) is if he can refer to a tome containing everything he will ever need to know. You also know that such a tome is impossible to create, but you reason that even incomplete information is better than none at all.

As the weeks pass, you start to see patterns to the failures you encounter. A certain brand of ethernet switch fails in a certain way under specific conditions. A particular activity report causes the entire intranet server to freeze solid for three hours once every month. The computers on the eighteenth floor keep losing the network but no others do. And so on. You also realize that it is taking you less time to handle each fire as you become more and more familiar with the infrastructure. You have reached the break over point where you now know intuitively how much of the system operates. You start finding it easier and easier to slot new information into your growing understanding of the mess you have inherited. And it is, for certain, a mess, having evolved organically over the decades of Acme Widgets Inc’s history. You even begin to understand management’s reluctance to provide a full time assistant, though you still believe such a person is important.

Now, several months into your stint at Acme Widgets, you face a problem. You desire to improve the infrastructure. The motivation is purely selfish, of course. You are sick of having to put out fires continually and you know that improved infrastructure will reduce the time you spend doing so. But how do you proceed? You know that management will not accept a large expenditure all at once to forklift upgrade the entire infrastructure. You also realize that doing so is a bad idea since it will likely all fail at the same time in that case. Instead, you need a different plan, one that fits within your departmental pittance of a budget.

You start by referring to your painstakingly created documentation, looking for gaps and patterns. You spend every free moment crawling all over the company offices, tracing wires and taking inventory of components ranging from 10Base2 tees and GigE switches to telephones and cable runs. It takes months to make sense of it all, during which time company employees have come to refer to you as The Inspector. Finally, you feel you understand the mess sufficiently and begin to form a plan.

You start with the ever problematic eighteenth floor which, you have learned, still has a great deal of 10Base2 wiring linking switches of various vendors using media converters of various quality all over the floor. You blow two months capital budget on a shiny new GigE switch and install it in the wiring closet. You wire it into the overall network and marvel at the fact that nothing crashes or fails when you do so. Then you painstakingly replace every network connection on the floor with a new cat6 cable run back to the wiring closet and, to avoid confusion later, carefully pull out every old wire that is decommissioned. The denizens of the floor snigger about the Inspector crawling all over the floor and spending hours climbing up and down ladders. But slowly, you complete your task and the regular calls to the eighteenth floor stop. One day you receive a call from one of the denizens of the floor wondering where you have been for they have missed the entertainment of watching you at your task. None of them notice that their work has been more efficient with fewer interruptions waiting on IT since you started your project.

With the first upgrade project complete, you realize just how much of your time had been taken up by the eighteenth floor. Now you have time to examine other problems so you tackle the next most obvious problem using the same methodical approach. Again, you are amazed at the time saving once that problem is sorted out. Re-energized, you continue incrementally replacing or otherwise updating problematic infrastructure, each time tackling the item that causes the highest call volume.

Suddenly, after several years of diligent work upgrading, replacing, and otherwise fixing the creaking IT mess you inherited, you find yourself spending the better part of every day sitting at your desk idly clicking around your fancy new infrastructure monitoring system while you wait for new workstation installations to complete. You no longer spend every minute of the day fighting fires. Sure, the occasional problem pops up and you handle it efficiently. The old nickname from your early days has stuck around but nobody remembers why. Management begins to question even your part time assistant. In your quest for laziness, you have demonstrated that the company only needs one person to manage the whole IT infrastructure. Oops.

With management rumbling about downsizing your department to just you, you wonder what you can do to make them see the light. Suddenly, you remember you have not taken a vacation in far too long. You schedule your vacation time and the bean counters are ecstatic to have that liability off the books. You leave all technology behind and set off for the back country where you will be unreachable for three weeks. During that time, your part time assistant will have to hold the fort.

Upon your return, you step into a hornet’s nest of activity but it cannot touch the serenity acquired through weeks away from it all. Your assistant strolls up to you and casually notes that the primary server is down, the failover didn’t, and the vendors are pointing fingers at each other. You casually ask why the failover failed and your assistant shrugs. Cleaning staff pulled the power cord out of the power socket and promptly fried the UPS. Why were the cleaning staff in the server room? Management. What is the ETR? Your assistant glances at his watch and smiles. Everything is sorted out now. You raise your eyebrow. It is all a prank, it seems.

In the calm of your office, your assistant briefs you on your absence. It seems an unusually high number of problems materialized while you were away but your assistant has everything well in hand. You check over his work and are mostly pleased – he has not messed up any more than expected, or, indeed, any more than you would have. And he has kept a detailed log, bless him. Armed with that information, you confront management and they happily agree that your assistant should be full time. You smile and set to work planning for the expansion you know management will spring on you without notice eventually.


In case it was not obvious, the preceding is a fairy tale. Very seldom will an IT job progress in that manner. Still, the goal of the unnamed IT person in the tale should be the goal of any IT manager, and the path taken to achieve that goal is sensible. Rather than try to sell a wholesale replacement of everything all at once, which is usually expensive and causes sticker shock for bean counters and management alike, incremental updates often disappear into the noise. And by strategically picking what to upgrade, a substantial impact may be seen for a relatively minor investment. Rather than develop a rube goldberg plan of action and then slavishly sticking to it, our hero tackled one existing source of annoyance and corrected it before moving on to the next. Instead of treating the mess as a single problem, it was broken into smaller pieces that were more easily identifiable and fixable.

In case you are wondering how practical such a plan of attack much be, consider that, though the details were substantially different and the scale smaller, my own day job progressed in much the same manner. Over the space of years, I slowly moved a production network through dozens of minor steps from a disaster waiting to happen if anyone merely tripped on a network cable to the fairly resilient network that is in place today, and I did it on what would be considered a shoestring budget.

Leave a Reply

Your email address will not be published. Required fields are marked *