The job of managing an organization’s IT operations can be overwhelming. Now, IT leaders are looking to artificial intelligence (AI) to relieve the pressure.
Today’s infrastructure is a dynamic mix of on-premises and multi-cloud environments running on bare metal, virtualized servers, and containers, said Rodrigo de la Parra, AIOps Domain Leader at IBM during a recent CanadianCIO Virtual Roundtable. Traditional management solutions struggle to monitor the entire system end-to-end, making it difficult to spot issues before they impact customers.
“The amount of information from all of the tools is skyrocketing,” said de la Parra. “We have to reduce the noise and empower operations teams to do what they do best.”
The answer lies in applying artificial intelligence to IT operations (AIOps), de la Parra said. “It allows you to optimize the capacity of IT operations,” he said. “Our goal is to avoid issues proactively by providing context and promote collaboration among siloed teams to eliminate manual labour intensive work so the team can focus on more valuable projects that accelerate transformation.”
What is AIOps?
AIOps is a platform that leverages AI, machine learning, big data from different sources, and natural language processing, explained de la Parra. When combined with automation, “it allows IT teams to scale mundane activities efficiently,” he said.
The platform provides visibility into performance data across all environments. It analyzes behavioural data to look for anomalies, alerts IT staff to problems, their root causes and recommends solutions. As the system and the machine learning matures, AIOps can solve issues without human intervention, de la Parra said. “It is a domain agnostic platform that provides a holistic view and identifies issues in real-time,” said de la Parra. “There are no blind spots.”
The system relies on a centralized data lake as a “single source of truth,” said de la Parra. He noted that it can analyze structured and unstructured data with no need for tagging. The AI models use the data source to create a baseline of the way that a system is normally performing so that it can spot any deviations.
Typically, the platform can be trained in three or four weeks, de la Parra said. It has a high degree of accuracy for finding anomalies based on the quality and completeness of the data sources provided by using varying out-of-the-box techniques to reduce the noise. IBM Watson AIOps create stories in chatops, via Slack or MS Teams, providing evidence for the anomalies found.
What are the advantages of AIOps?
Individual management tools can’t intelligently sort out the significant events or correlate the data across different layers of applications and environments. With AIOps, IT operations teams gain real-time insights and predictive analyses to respond to issues faster. It also makes it easier to meet user and customer service level requirements, said de la Parra. “The system enables them to move from a reactive to a proactive unsupervised approach.”
The platform reduces the noise created by false positives and massive amounts of data. It also prevents compliance exposures, said de la Parra. He noted that AIOps isn’t only for operations. It can also be used earlier in the application lifecycle (shift-left) to analyze new implementations and predict risk based on similar change requests and associated incidents to mitigate potential problems.
Ultimately, one of the biggest benefits is that AIOps gives time back to the IT Operations team. “It gives them a better work-life balance so they are more focused on valuable activities,” said de la Parra. “When you put all of the pieces together to optimize performance and reliability, it makes IT operations a better partner for the business.”