Geo-redundancy: Key to resiliency when disaster strikes

Mathieu Moquin

2 years ago

The summer of 2023 has been one for the books, with record-high temperatures sweeping across Canada and the worst wildfire season our country has ever seen. Such extreme weather events are turning up the heat on enterprise operations, notably IT infrastructures, increasing the risk for downtime and data loss.

For instance, a power outage due to wildfires could take out an entire data center for hours if not days, and a severe heat wave can cause cooling units in a server room to fail. Whether large- or small-scale, unplanned outages have the potential to disrupt operations, reduce employee productivity and lead to reputational damage – all of which contribute to lost revenue.

According to a 2022 report from Uptime Institute, the financial cost of IT outages has soared in recent years, with over 60 per cent of failures resulting in at least $100,000 in total losses, compared to just 39 per cent in 2019. Even a minor outage can have long-term business impact.

In today’s 24/7 digital economy, enterprises need to safeguard their IT infrastructure from the growing threat of hazardous weather or other natural disasters. The pressure is on IT teams to strengthen their business continuity and disaster recovery (BCDR) strategy and make mission-critical systems more resilient to any type of catastrophic system failure.

Local redundancy isn’t enough

One way to keep IT systems running during an outage is by having a redundant power supply. Data centers, as well as hospitals and airports, typically have such solutions in place via locally stored electricity in batteries or with backup generators.

Having redundant components within a data center is standard practice for helping reduce the risk of downtime and data loss, but what happens if the entire site is shut down by an event?

Local operations might be paralyzed for an extended period in this scenario, but an enterprise’s most critical applications can continue running in a backup or cloud environment, provided that the secondary data center is not affected by the outage. All traffic is automatically rerouted to the secondary site with minimal service downtime for users.

By replicating applications and infrastructure across two or more physically disparate locations – known as geographic redundancy – enterprises can readily bounce back from unexpected disruptions and ensure high availability of IT services. Ideally, there is enough distance between the data centers, or availability zones, that an outage affecting one site will not affect the other.

In Canada, data residency matters, and organizations want to know where their data is hosted. They should work with service providers that keep multiple geo-redundant facilities within borders and can meet local regulatory compliance requirements.

Choose automation and replication wisely

Geographic redundancy for an entire site provides the highest level of resilience within a BCDR strategy. However, managing a distributed hybrid environment can be complex, which makes automation essential. The more automation the better, but that comes with a higher price. It’s important to select an automation platform that isn’t tied to a specific vendor, but is rather agnostic with any and all providers.

The choice of replication solution is yet another significant consideration. Some virtualization platform providers offer solutions that work only within their own stack, and older technologies that use snapshots can negatively impact the performance of a production environment.

While hardware-based solutions offer certain functionality, replication should be asynchronous, utilizing specialized software that is based on continuous data protection (CDP) and runs at the hypervisor level. Such a solution can be switched on and be fully operational in a short timeframe, without having to invest in hardware.

Geo-redundancy for business continuity

Extreme weather events are on the rise, threatening the IT systems that Canadian businesses and customers rely on daily. When unplanned outages do occur, getting systems back up and running quickly can mean the difference between a successful enterprise and a defunct one. Geographic redundancy in the cloud is one way organizations can minimize downtime and ensure their critical systems are available to users, even in the face of a disaster.