ECS Fargate autoscaling more rapidly?

Question:

I’m load testing my auto scaling AWS ECS Fargate stack which comprises of:

  • Application Load Balancer (ALB) with a target group pointing to ECS,
  • ECS Cluster, Service, Task, ApplicationAutoScaling::ScalableTarget, and ApplicationAutoScaling::ScalingPolicy,
  • the application auto scaling policy defines a target tracking policy:
    • type: TargetTrackingScaling,
    • PredefinedMetricType: ALBRequestCountPerTarget,
    • threshold = 1000 requests
    • alarm is triggered when 1 datapoint breaches the threshold in the past 1 minute evaluation period.

This all works fine. The alarms do get triggered and I see the scale out actions happening. But it feels slow to detect the “threshold breach”. This is the timing of my load test and AWS events (collated here from different places in the JMeter logs and the AWS console):

Q1: Is there a way to make the alarm trigger faster and/or as soon as the threshold is breached? To me, this is taking 1m20s too much. It should really scale up in around 1m30s (1m max to detect the ALARM HIGH state + 30 seconds to start the task)…

Note: I documented my CloudFormation stack in this other question I opened today:
Cloudformation ECS Fargate autoscaling target tracking: 1 custom alarm in 1 minute: Failed to execute action

Answer:

You can’t do much about it. ALB sends metrics to CloudWatch in 1 minute intervals. Also these metrics are not real-time anyway, so delays are expected, even up to few minutes as explained by AWS support and reported in the comments here:

Some delay in metrics is expected, which is inherent for any monitoring systems- as they depend on several variables such as delay with the service publishing the metric, propagation delays and ingestion delay within CloudWatch to name a few. I do understand that a consistent 3 or 4 minute delay for ALB metrics is on the higher side.

You would either have to over-provision your ECS to sustain the increased load by the time the alarms fires and the up-scaling executes, or reduce your thresholds.

Alternative, you can create your own custom metrics, e.g. from your app. These metrics can be even with 1 second intervals. Your app could also “manually” trigger the alarm. This would allow you to reduce the delay you observe.

Leave a Reply