Unless we’re living under skies of brimstone and hellfire, most companies shouldn’t have to replicate every piece of data to protect their business from the next cataclysmic event. Nor should they necessarily have to cough up millions for a mirror site that traces every network transaction.
And let’s face it, unless you’re cyber-cynical, catastrophes are extremely rare. Be that as it may, enterprises are increasingly being held accountable for their data and prudence points to being prepared.
We went on the hunt for the most commonly overlooked elements are in today’s disaster recovery plans.
Understand business needs
Ultimately IT is there to serve business, and disaster recovery planning should be no different.
Sound hackneyed? Well, most IT shops still don’t get it. Experts in the field insist people are still making technology decisions, and not business decisions.
A lead consultant in business resiliency says recovery capabilities have to be matched to the business requirements.
“Understand that disaster recovery and business continuity are part of overall risk management,” he says. “It’s not just an IT thing.”
IT has a responsibility to understand how business workflow ties into business applications, and how those applications, in turn, are supported by infrastructure.
“One of the challenges I see all the time is that business continuity and disaster recovery fall back to the responsibility of IT, and IT’s normal response is to throw technology at it,” says another data recovery expert.
“We tend not to spend enough time communicating out there with the business units and understanding what their business problems are,” he says.
Know your enemies
As a type of insurance policy, it’s helpful to know what threats and vulnerabilities you’re likely to come up against.
Unless you’re in a tornado area, on a fault line or flood plane, you probably won’t be building a mirror site of your entire IT infrastructure.
But going through that vulnerability and risk assessment can be a heated debate, says George Kerns, president and CEO of Fusepoint Managed Services Inc.
The budget for a recovery plan is large compared to the operating budget, and if the chance of a disaster occurring isn’t high, how do you avoid spending too much?
“I think this has to come down to a rational conversation between the CIO and the CEO,” says Kerns. “They have to be aligned on what risks they’re willing to take with their business.”
Catastrophic failures of data centres are rare. They’re typically built for high availability, located in a secure area and supported by a network operations centre.
“Your disaster recovery plan is going to depend on how data- intensive your business is and what your company’s appetite for risk is.”
Map your support system
Often it’s not clear how an application is serviced up to a business process, and how the underlying infrastructure supports those applications. Unless you know all those pieces, you’re not going to be able to determine what a sensible disaster recovery plan looks like, say out experts.
Without system-to- application mapping, you cannot understand the interoperability and interrelationships you need to manage.
“This helps you understand the recovery bundles and where those single points of failure are in the environment.”
Having that business workflow to application to system mapping can also drive the discussion around cost. It gives executives a clear sense of the extent to which IT supports business.
Kerns says a disaster recovery plan can be dissected in different ways. Depending on how fast you need to get a piece of your system back up and running, companies can look at recovery sites that are cold, warm or hot. Not every business application is as critical as the next.
Get the message out
Consistent communication is another key element that’s often overlooked. Business needs IT’s participation and IT needs plenty of time and access to resources.
“You have to have that executive-level buy-in or you’re probably going to put up a facade of a plan without investing in the resources,” says Kerns. “And the people who come up with the plan have to engage the people who are running IT operations,” he adds.
On another level, no one ever wants to talk about the gap in perceptions and expectations of recovery time.
Our experts suggest most business units believe their IT systems can be back up within hours, while IT will estimate a couple of days and an actual assessment of the technology will reveal a further gap.
It’s not only about communicating business requirements to the IT people, sometimes it’s just being able to get in touch with people when something goes wrong.
Fail your test
Like any type of planning exercise, you have to test it and test it and test it.
Our experts suggest there is very little in the way of full end-to-end testing for applications across multiple platforms.
“This is a big area where more testing needs to be done, with a more rigorous, more integrated approach and a stronger level of governance around it,” says one of our leaders.
And don’t test to pass; you have to test to fail, they say. “If it fails, only then can you understand what to fix.”
Many organizations think they’re in a better position than they really are. “One of the biggest drawbacks is this pass-fail mentality. It doesn’t help to set yourself up to pass.”
You have to make sure it all fits together and there are no holes in it, adds Kerns.
“A lot of people put some effort into developing a plan so they can check the box and say they have a plan,” he says. “But forced into a true recovery situation, most companies would find their plans haven’t been updated and they’ve never been through a full-blown test.”
Keep up with change
Disaster recovery planning is not a one-time event. As your IT system evolves with new applications, upgrades and configuration changes, these changes will likely affect your recovery plan.
“Keep your technical recovery capabilities consistent with the latest production configurations.”
As new technologies and applications are added, they frequently don’t get copied over and the recovery side of things quickly falls out of date, he says. Change management processes and the application development lifecycle need to take recovery into account.
It’s an evergreen plan that must be kept current. “You can’t be working from something that’s three months or six months stale,” says Kerns.
“You have to keep rolling these things through so that you’re prepared.”