Linux Policies

Before reading about these default policies, note that both the Elevated User CPU and Elevated System CPU policies assume that the CPU Collector is configured to collect aggregate CPU metrics, rather than per core metrics.

It also assumes that the metrics are being normalized. This is done by setting the percore setting set to FALSE (it is TRUE by default) and the normalize setting set to TRUE (it is FALSE by default) in your configuration file. After adjusting these settings, save the configuration file and restart the agent to apply the changes. See the Linux agent for more information.

Policy name Duration Condition 1 (and) Condition 2 Category Description
Linux – CPU Threshold Exceeded 15 min cpu.total.utilization.percent has a static threshold >95% CRITICAL The CPU on the SERVER instance has exceeded 95% for at least 15 minutes.
Linux – Elevated System CPU 30 min metricly.linux.cpu.total.system.normalized has an upper baseline deviation + a static threshold ≥ 30% INFO This policy will generate an Informational event when CPU usage by system processes is higher than normal, but only if the actual value is also above 30%. Customers typically don’t want to be informed of deviations in CPU behavior when the actual values are too low; you may want to tune the 30% threshold for your environment.
Linux – Elevated User CPU 30 min metricly.linux.cpu.total.user.normalized has an upper baseline deviation + a static threshold ≥ 50% INFO This policy will generate an Informational event when CPU usage by user processes is higher than normal, but only if the actual value is also above 50%. Customers typically don’t want to be informed of deviations in CPU behavior when the actual values are too low; you may want to tune the 50% threshold for your environment.
Linux – Heavy CPU Load 15 min metricly.linux.cpu.total.user.normalized has an upper baseline deviation + an upper contextual deviation metricly.linux.loadavg.05.normalized has a static threshold > 2 CRITICAL This is a CRITICAL event indicating that the server’s CPU is under heavy load, based upon upper deviations on CPU utilization percent and the normalized loadavg.05 metric being greater than 2. Rule of thumb is that the run queue size (represented by the loadavg) should not be greater than 2x the number of CPUs.
Linux – Disk Utilization Threshold Exceeded 15 min metricly.linux.diskspace.*.byte_percentused has a static threshold >95% CRITICAL The consumed disk space on the SERVER instance has exceeded 95% for at least 15 minutes.
Linux – Heavy Disk Load 15 min iostat.*.average_queue_length has an upper baseline deviation + an upper contextual deviation WARNING This is a WARNING which indicates that the disk is experiencing heavy load, but performance has not yet been impacted.
Linux – Heavy Disk Load with Slow Performance 15 min iostat.*.await has an upper baseline deviation + an upper contextual deviation iostat.*.average_queue_length has an upper baseline deviation + an upper contextual deviation CRITICAL This is a CRITICAL event which indicates that the disk is not only experiencing heavy load, but performance is suffering.
Linux – Agent Appears to be Down 15 min metricly.metrics.heartbeat has a static threshold <1 WARNING A heartbeat has not been received for a Metricly Agent for at least the past 15 minutes; the Agent may be down.
Linux – Memory Utilization Threshold Exceeded 15 min metricly.linux.memory.utilization.percent has a static threshold > 95% CRITICAL This is a CRITICAL event which is raised when memory utilization exceeds 95%.
Elevated Memory Usage 30 min metricly.linux.memory.utilizationpercent has an upper baseline deviation + a static threshold > 50% INFO This policy will generate an Informational event when memory usage is higher than normal, but only if the actual value is also above 50%. Customers typically don’t want to be informed of deviations in memory usage when the actual values are too low; you may want to tune the 50% threshold for your environment.