AWS ELB Policies

Policy names are prefixed with AWS ELB –

Policy name Duration Condition 1 (and) Condition 2 Category Description
Elevated BackendError Rate (Low Volume) 15 min metricly.aws.elb.httpcodebackenderrorpercent has an upper baseline deviation + an upper contextual deviation metricly.aws.elb.requestcount has a static threshold <1,000 WARNING This is the first of three policies that look at elevated backend error rates. This policy looks specifically at low traffic volume cases. When traffic volumes are low, elevated error rates tend to be less important. For example, a 50% error rate is pretty significant if the total number of requests is 1 million; it is less so if the total number of requests is 10. Thus, this policy will generate a Warning if error rates are higher than normal and traffic volumes are low. By default, “low” is defined as less than 1,000 requests; you may wish to tune this for your own environment.
ElevatedBackend Error Rate (High Volume, Low Error Rate) 15 min metricly.aws.elb.httpcodebackenderrorpercent has an upper baseline deviation + an upper contextual deviation + a static threshold <2% metricly.aws.elb.requestcount has a static threshold ≥ 1,000 WARNING This is the second of three policies that look at elevated backend error rates. For many customers, an error rate which is low enough is not cause for concern even if it is higher than normal. For example, if the normal error rate is between 0.25% and 0.75%, and observed error rate of 1.1% is higher than expected, but may not be worth more than a Warning. Thus, this policy looks for those cases where the error rate is higher than expected, but is under 2%. It also looks for traffic volumes to not be low, since the low traffic scenario is covered by the “Elevated Backend Error Rate (Low Volume)” policy. You may wish to tune either the 1,000 request count threshold, the 2% error threshold, or both, to better suit your environment.
ElevatedBackend Error Rate (High Volume, High Error Rate) 15 min metricly.aws.elb.httpcodebackenderrorpercent has an upper baseline deviation + an upper contextual deviation + a static threshold ≥ 2% metricly.aws.elb.requestcount has a static threshold ≥ 1,000 CRITICAL This is the third of three policies that look at elevated backend error rates. In this case, we are looking for both high traffic volumes (>1000) as well as error rates that are not just higher than normal, but are above the 2% threshold. In those cases, a Critical event will be generated. You may wish to tune either the 1,000 request count threshold, the 2% error threshold, or both, to better suit your environment.
Elevated Latency 30 min aws.elb.latency has an upper baseline deviation + an upper contextual deviation metricly.aws.elb.requestcount has a static threshold ≥ 1,000 CRITICAL This policy will generate a Critical event when average latency is higher than normal for half an hour or longer. Note that there must also be a minimum number of requests for this policy to trigger; this is because with too few requests, the average can tend to be skewed by outliers. The default request threshold is 1,000; you may wish to tune this for your environment.
Surge Queue UtilizationGreater Than 5% 15 min metricly.aws.elb.surgequeueutilization has a static threshold > 5% WARNING The ELB surge queue holds requests until they can be forwarded to the backend servers. The surge queue can hold a maximum of 1,024 requests, after which it will be full and will start rejecting requests. Metricly’s Surge Queue Utilization metric reflects as a percentage how full the surge queue currently is. If the surge queue is more than 5% full for 15 minutes or longer, a Warning event is generated.
Surge Queue UtilizationGreater Than 50% 15 min metricly.aws.elb.surgequeueutilization has a static threshold > 50% CRITICAL The ELB surge queue holds requests until they can be forwarded to the backend servers. The surge queue can hold a maximum of 1,024 requests, after which it will be full and will start rejecting requests. Metricly’s Surge Queue Utilization metric reflects as a percentage how full the surge queue currently is. If the surge queue is more than 50% full for 15 minutes or longer, a Critical event is generated.
Unhealthy Host Percent Above 50% 15 min metricly.aws.elb.unhealthyhostpercent has a static threshold ≥ 50% + a static threshold < 75% WARNING More than half (50%) of the hosts associated with this ELB are in an unhealthy state.
Unhealthy Host Percent Above 75% 5 min metricly.aws.elb.unhealthyhostpercent has a static threshold ≥ 75% CRITICAL More than three quarters (75%) of the hosts associated with this ELB are in an unhealthy state.
Elevated ELB Error Rate 15 min metricly.aws.elb.httpcodeelberrorpercent has an upper baseline deviation + an upper contextual deviation + a static threshold ≥ 2% aws.elb.requestcount has a static threshold ≥ 1000 CRITICAL This is another error rate policy, but rather than looking at backend error rates, it is looking at errors from the ELB itself. In this case, we look for both high traffic volumes (> 1000) as well as error rates that are not just higher than normal, but are above a 2% threshold. In those cases, a Critical event will be generated. You may wish to tune either the 1,000 request count threshold, the 2% error threshold, or both, to better suit your environment.