Business agility isn’t just a marketing phrase – it reflects the attitude of users, both consumer and business. Now that they’re used to the almost instant gratification in the consumer app world, they expect the same of business applications. People won’t wait years, or even months, for fixes and upgrades. If their experience is affected, they tend to move elsewhere almost immediately.
To further complicate matters, the cloud and the growth of mobile-first app development have fuelled many changes in the way software is developed and deployed. One of the biggest changes is the growth of containers such as Docker for the deployment of microservices.
While Docker helps teams rapidly develop and deploy their cloud-based applications, supporting the goal of continuous delivery to satisfy customer need for fast fixes and new features, it also makes application performance monitoring (APM) much more complex. Unlike monolithic applications, microservices-based applications housed in Docker containers are continually in flux. Containers come and go as required, and may move among machines in the cloud. By the time a problem is identified, the container in question may have disappeared, along with the critical information needed to troubleshoot the issue.
In this environment, APM faces three key challenges:
- To prevent alert storms and to surface key performance metrics, teams need intelligent visualization and analytics.
- Microservice architectures increase the number of components, dependencies, and communication flows, all of which need to be automatically mapped so telemetry data can continue to flow as containers move.
- Microservice performance is impacted by both application metrics and container-specific factors, so teams need to correlate app and container performance to get a complete picture for root cause analysis of issues.
Working at scale
An APM solution to meet these needs in the dynamic Docker world has to automatically detect and map new containers, their dependencies, and how communication flows among them, as well as adapting as old containers disappear or move. It also needs to support Docker clusters so it can track and monitor microservices and distributed cloud apps. And it needs to do this at scale; where a traditional stack of operating system and application may have 150 metrics, a 10 container cluster on one host could have 1150.
This leads to a methodology that’s the opposite to the technique most often used in server monitoring, where the monitoring solution polls servers to determine their status. Since containers are transient, they can’t be easily located to be polled, and may have disappeared by the time the next polling interval arrives. Instead, it makes more sense to track clusters of containers, and having them push their health data to a monitoring tool.
Cutting through the noise
To add to the complexity, many microservices run in multiple containers, each running a separate process. To determine the health of the microservice, the containers involved should be monitored as a unit to provide insight into how the application is performing. However, given the huge number of containers and microservices in a typical environment, automating the recognition of patterns to detect anomalies, using machine learning and artificial intelligence, is necessary to cut down on the “noise” seen in older types of monitoring solution, so teams don’t miss real problems buried among dozens of insignificant alerts.
A well-performing automated system will generate early warnings by identifying abnormal or improper behaviour in the services, applications, and infrastructure. Some can also determine the root cause of the problems, and build automated triage or mitigation workflows. This gives engineers the time to perform high-value strategic tasks rather than dealing with repetitive, often error-prone manual work.
The DevOps OODA loop – observe, orient, decide, and act – can be mapped to containerized production environments to provide a workflow for managing Docker containers.
- Observe. Receive alerts and notifications from a monitoring system that filters out the noise.
- Orient. Take the information from the logs and monitoring system and use it to identify the symptoms causing an issue. It is critical to know the exact source of the information, with minimal noise.
- Decide. Based on the symptoms identified during the orientation process, decide what action to take – perhaps make a configuration change, or roll back a faulty application update.
- Act. Make the necessary changes. The container platform and tools must allow for fast action once a decision is made.
Enterprise container implementations must allow the OODA loop’s implementation. To enable this, any container management and monitoring system must provide accurate information, and allow for quick action to remediate issues. As well, developers need to be aware of how Docker containers are built and managed so they can create applications that work well in this relatively new environment.