Managing System Resources through dynamic threshold monitoring feature by Azure

The benefits of the cloud are well documented: spinning up virtual machines in minutes, providing scalable and durable cloud storage, backup and recovery solutions for businesses of all sizes. However not all cloud solutions are created equal.This blog would take you in depth into grasping benefits of Azure such as managing system resources through dynamic threshold monitoring.

Managing the system resources effectively on the cloud is one key challenge in a cloud infra design and management. To make it easy for the IT Infra teams, all cloud service providers enables setting thresholds for the resources and monitor the same against these thresholds. This is a clean and simple solution to monitor the health of the resources.

Azure

However, in reality the resource usage is not always consistent through-out the day and that calls for different thresholds for monitoring the resources based on the day of the time. In the single threshold model, any unexpected and near-fatal spikes during the non-peak hours are generally missed, as the spike may be well below the single threshold set for monitoring.

Dynamic threshold

Azure Monitor recently released a feature for monitoring the server components using dynamic threshold limits which can address the scenario mentioned above. Upon enabling the feature, Azure allows the system to continuously track and learn the usage pattern. The features also allow the user to configure the monitoring and the alert triggering process through parameterization. The user shall be able to set the following parameters while setting up the dynamic thresholds for the resources :

  1. Aggregation Type: Allows the user to choose the data aggregation type for calculating the dynamic thresholds.
  2. Threshold conditions: Allows the user to set the “operator” to trigger the alert on reaching the threshold.
  3. Monitoring sensitivity: Sets the sensitivity of the monitoring and alerting process.
  4. Violation count:  Allows the user to define the condition for triggering the alert.

The dynamic threshold feature allows the system to aggregate the system utilization continuously and benchmark the same and compare the same against the real time utilization data.

The fig 1 shows the Average CPU utilization over a period.

Dynamic Threshold limits and Pattern set

Fig 1 : Dynamic Threshold limits and Pattern set

Based on the alert conditions set by the user the system shall trigger the alerts or events configured by the user.

Fig 2 : Alert message from Azure Monitor

Benefits

In large IT enterprises, the usages of the resources is hardly a straight line. Most of the servers have a peak utilization and non-peak utilization. But with the static resource monitoring, it becomes very difficult and at times, pointless, to set one single threshold to monitor such resources. In such and similar situations, the dynamic threshold helps in monitoring the resource utilization to a pattern instead of point value. Any deviation in the utilization from the pattern, shall be trapped for actions and remediations.

Some of the key benefits this feature delivers are as follows :

  1. Multiple threshold: Allows the resources to be monitored against a pattern instead of a single threshold
  2. Cost Optimization: The resources can be resized based on the utilization and save cost
  3. Prevent outage: The unusual spikes and troughs can be early warnings or larger issues and can help in proactive / preventive actions

Written by : 

Aravind Raj and Santhosh

In Blog