If you woke up tomorrow and ran a marathon, how would you fare? It’s highly doubtful that you would successfully run the 26.2 miles without months of training, drills, and exercises.
The same is true for disaster recovery (DR): The chance that you could successfully recover IT operations without having exercised your DR plans on a regular basis is slim at best. The chance that you could successfully recover and meet your recovery objectives is zero. Yet Forrester finds that exercising DR plans is one area in which many organizations continue to fall short.
Although most enterprises claim they conduct a full exercise of their DR plans at least once per year, anecdotal evidence suggests that the majority of these exercises are not comprehensive and thorough; enterprises often just exercise a portion of the plan or a subset of applications. Indeed, many of the organizations Forrester has spoken with know that they need to improve their DR exercise program, but face barriers such as a lack of executive support, limited employee resources, and a fear of interrupting business processes. If this sounds all too familiar, consider the following 10 best practices for updating and improving your current DR exercise program:
1. Define Specific Exercise Objectives Upfront
Exercising for the sake of exercising is a waste of time. Make sure that there are clear and concrete objectives and goals set up front that will help determine the ultimate success of an exercise. One objective may be as simple as, “Verify our stated recovery time and recovery point objectives.” You could orient other objectives around training, such as, “Familiarize the database administrators with the plans for recovering Oracle.”
2. Include Business Stakeholders
Business owners play a vital role in your DR exercises, and you need to involve them from the start of the exercise until you have recovered all services. Business stakeholders should verify the successful recovery of services. This has the dual benefit of ensuring that you have properly recovered business processes with all of their critical components as well as ensuring that business stakeholders know what to expect in terms of recovery capabilities and performance at the recovery site during an actual declaration.
3. Rotate Staff Responsibilities
It’s important that the person who wrote the DR plan is not the same person who executes the test, as it is unlikely that that individual would be available in a real disaster. Some companies Forrester interviewed went so far as to have employees with little specific knowledge of a system executing those tests, such as a system administrator running the database DR test. An important secondary benefit of a DR exercise is training; by assigning staff to take on new roles during exercises, you are essentially cross-training staff in different areas.
4. Develop Specific Risk Scenarios For Your Exercises
Many enterprises conduct their DR exercises without specific scenarios; they tell the response team to assume the data center is “a smoking hole.” It is important, however, to define specific risk scenarios even for DR testing for two main reasons: 1) It provides a more realistic situation for the response team to react to, and 2) different scenarios require different actions from the IT staff. For example, the DR plan for a short outage at the primary data center that only requires resuming operations would be different from a long-term outage that requires failover (and eventually failback), which in turn would be different from scenarios where only portions of the IT infrastructure were down.
5. Run Joint Exercises With Business Continuity (BC) Teams
In our research, Forrester found that many BC and DR teams run all of their exercises separately and often fail even to communicate when they run exercises. However, you should aim to exercise the full enterprise BC and DR concurrently at least once per year. This is especially important if the data center is in the same location as corporate headquarters.
6. Vary Exercise Types From Technical Tests to Walk-Throughs
A common misconception in IT is that walk-throughs and tabletop exercises are not necessary for DR exercises. While it’s true that these types of exercises won’t test the technical capabilities of a failover, they are still critical for training, awareness, and preparedness. Interviewees told us that the majority of the time, exercises that didn’t go as planned actually struggled most with communication and employees’ understanding of their roles during the exercise. Non-technical exercises such as walk-throughs and tabletops will help make these processes go more smoothly.
7. Make Sure to Test All IT Infrastructure Concurrently at Least Once Per Year
Waiting longer than a year risks too much change in IT environments and personnel — you need to bring new staff members throughout the organization up to speed on DR plans. The most advanced firms run full DR tests as often as four times per year. In between full tests, most firms conduct component tests that vary in frequency depending on the criticality of the systems and rate of change in the environment.
8. Identify Members for the Core DR Response Team
The stress of working under time and resource restraints for long hours, often during nights and weekends, is something people cope with in different manners. When picking a core response team to lead IT recovery, it’s important to pick people who can work under extreme amounts of pressure (and sleep deprivation). During an exercise or test, identify those individuals who can remain calm and collected.
9. Learn From Your Mistakes
The point of running DR exercises is to find potential barriers to recovery while in a controlled environment. If you aren’t encountering problems during your exercises and tests, it’s more than likely you aren’t looking hard enough, aren’t testing thoroughly enough, or you have designed scenarios for recovery that are too simple. When you complete exercises and tests and you have identified problem areas, use what you have learned to update plans and create best practice documents.
10. Report Results to Stakeholders