Developing and evaluating data center maintenance programs

Glenn Weir

7 years ago

‘Many things in life require maintenance: your car, for example. Your data center is no different. An organization that wants to offer superior, reliable performance must dedicate time and resources to ensuring all its data systems are not merely going but going strong.

The answer to the question of why a company should maintain its systems is self-evident, and even data center managers with poor maintenance habits (i.e., “I don’t have enough time” or “My budget is limited”) will readily cop to its importance.

Failure to regularly maintain systems may in fact be more a matter of distraction than motivation (or lack thereof). Daily demands and mini crises — regular occurrences in many companies — take priority, leaving the less glamorous maintenance duties on the back burner. In many organizations, especially those working to make their mark, the focus on daily demands is permanent; maintenance, meanwhile, is “something we’ll get to.”

The longer data centers go without upkeep, the more likely they are to suffer downtime or other inefficiencies that hamper performance. The result of this is a kind of “death by a thousand cuts.”

Many downtime incidents can be spotted quickly and remedied in a fairly straightforward fashion (e.g., outright server failure). However, inefficiency coming as a result of poor maintenance has a cumulative effect at least as serious as your standard, garden variety downtime incident.

As a general example: let’s say the performance of a system that’s not being maintained is degraded by a “mere” two per cent. Only 50 hours — a couple of days — equals one hour of downtime. However, while people might respond with lightning speed to a clear downtime incident, few might even notice a “mere” degradation.

While many, if not most, downtime events are likely not rigorously analyzed to uncover root causes — data center managers are chiefly concerned with just getting things back up and running — poor maintenance is often the root cause.

Organizations today are spending heroic sums of money building highly redundant data center facilities to deliver high availability IT solutions to an increasingly information-reliant world. These big-dollar investments have yielded a variety of sophisticated facility infrastructure designs that are inherently reliable and progressively more energy-efficient.

However, no facility design — no matter how well planned and constructed — can withstand the disruption of a poorly designed or implemented Operations and Maintenance, or O&M, program. As inadequate maintenance and risk mitigation processes can quickly undermine a facility’s design intent, it is vitally important to understand how to properly structure and implement an O&M program to achieve the level of performance for which the facility has been configured.

While it may be commonly understood that a well-organized O&M program is required to achieve data center performance and efficiency goals, it can be difficult for those who are not maintenance professionals to understand what such a program looks like. The inherent resiliency of a facility will often mask operational deficiencies that have the potential to negatively impact data center availability, performance, and efficiency.

A free Schneider Electric publication, “A Framework for Developing and Evaluating Data Center Maintenance Programs,” provides:

a method for aligning the operational requirements of businesses with maintenance program standards that can be easily understood, communicated, and implemented throughout entire organizations;
details about the Tiered Infrastructure Maintenance Standard (TIMS)*
information about evaluating maintenance programs and interpreting the results

*TIMS provides a straightforward method for evaluating the maturity of an O&M program (existing or proposed), gives an understanding of the associated level of risk, and helps effectively communicate these concepts throughout an organization.

Download “A Framework for Developing and Evaluating Data Center Maintenance Programs”