In a recent blog post, Facebook’s vice president of engineering, Santosh Janardhan, revealed that the company’s engineers issued an order that unintentionally disconnected Facebook’s data centers from the rest of the world, causing users to lose access to some of the most popular messaging apps.
While he stated that the outage was not caused by malicious activity, the company revealed that its systems were designed to check commands to avoid errors, but an error in the audit tool failed to stop the command that caused the outage.
To further clarify the outage, the company explained that the outage took a long time because it shut down the tools needed to investigate and repair the outage.
While expert engineers were sent to the data center site, it took some time for them to be able to remedy the situation due to the company’s high level of physical and system security.
For Janardhan, “Every failure like this is an opportunity to learn and get better. From here on out, our job is to … make sure events like this happen as rarely as possible.”