For overwhelmed IT teams, AIOps holds the promise of automatically heading off potential business impacting outages. But some IT leaders are skeptical about whether it can really deliver results.
Rodrigo de la Parra, AIOps Domain Leader at IBM Automation, addressed that skepticism at a recent CanadianCIO virtual roundtable. “It’s more than a buzzword,” said de la Parra, “AIOps takes IT to a more software-driven, agile approach.”
AIOps is the application of artificial intelligence to enhance IT operations, explained de la Parra. It spots issues by using machine learning to analyze huge amounts of data generated by tools across an organization’s infrastructure. Automation and natural language processing can be leveraged to help fix problems in real-time.
“It’s not a product or a single solution,” said de la Parra. “It’s a journey.” To unlock the value, he said it’s essential to align AIOps to support business needs for improved efficiency and customer service.
De la Parra distinguished between what he referred to as ‘domain specific’ and “domain agnostic” tools. He noted that the domain specific tools had great value within their specific silo. But the real value, de la Parra said, comes from adding a domain agnostic approach because it can take feeds from all the tools running in silos and produce a single data source. “This becomes the single source of truth for the analytics and to provide evidence on the root cause to the stakeholders,” said de la Parra.
How to set up AIOps for success
Successful implementation starts with an operational assessment to identify current problems related to the organization’s business needs. From that, key performance indicators (KPIs) should be established to measure progress. Benchmarking where you are today, looking for real problems and developing measurable KPIs are at the heart of finding and proving the value of AIOps.
For example, de la Parra suggested that organizations could examine their efficiency by tracking the volume of major incidents relative to their applications, or the mean time to detect, acknowledge and resolve incidents. Value could be measured by looking at how much manual work is eliminated, or reductions in the number of issues reported by users.
One participant questioned how long it could take to set up the platform. According to de la Parra, this can be completed within a few weeks in many cases. He recommended starting with a manageable sized pilot to get some meaningful results quickly. Once baseline data is fed into the model, it will start detecting deviations in real-time. In addition, de la Parra noted that the IBM Watson AIOps solution comes with pre-defined algorithms that produce models to accelerate the implementation and the return on investment (ROI). “This approach removes the need for data scientists to normalize data, build a data lake, create models, and integrate interfaces to collaborate with the solution such as ChatOps,” he said.
Driving business benefits
Despite the discussion, it was clear that many of the participants remained skeptical about whether AIOps can produce a measurable return on investment. As well, there were questions about the trustworthiness of the data and whether domain-specific tools, such as those that monitor security, are sufficient.
The main advantage of domain agnostic AIOps over domain-specific tools is that it provides complete visibility, said de la Parra. “This is what makes it trustworthy AI,” he said. Decisions are driven by evidence from analyzing different data sources, grouping entities, localizing issues visualized in topology views to provide context, probable cause and next best action to resolve incidents. This is all done within the confines of policies and compliance requirements.
“It’s understandable to have skepticism over the effectiveness of AIOps given a common preconception around biased AI in general and the effort to implement solid AI models,” said de la Parra. “However, when we talk about AIOps at IBM, we are referring to a specific set of capabilities that provide concrete models to support log anomaly detection, blast radius, seasonal event grouping, next best action among others.”
Another concern raised by the group related to the issue of false positives on potential incidents. De la Parra noted that AIOps can analyze whether an issue is having an impact on business systems. If there is no impact, it does not send alerts. “Reducing the noise is critical to allow staff to spend time on higher value tasks,” said de la Parra. A 2021 study from Forrester analyzed the total economic impact of IBM Watson AIOps. It showed a 50 per cent reduction of MTTR (Mean Time to Resolve), 80 per cent time saved from remediating false-positive incidents, leading to $623K in savings and other benefits, such as proactive incident avoidance.
According to de la Parra, AIOps results in better overall IT service management. Not only does it reduce response time and downtime, it can also be used to look at the appropriate resource allocation for workloads in the cloud.
“Organizations already have the data,” said de la Parra. “AIOps enables the IT team to be more proactive and to become a trusted partner that helps drive business forward.”