How Azure Functions Scales Automatically Based on Events
Azure Functions is a serverless computing platform that lets you run your code without worrying about servers, infrastructure, or scaling. You only pay for the compute resources you use when your functions are running.
One of the key benefits of Azure Functions is that it scales automatically based on the events that trigger your functions. This means that you don’t have to provision or manage any servers, and your functions can handle any load, from a few requests per day to thousands per second.
In this blog post, we will explain how Azure Functions scales automatically based on events, and what factors affect the scaling behavior.
How Azure Functions scales automatically based on events
Azure Functions uses a component called the scale controller to monitor the rate of events and determine whether to scale out or scale in. The scale controller uses heuristics for each trigger type. For example, when you’re using an Azure Queue storage trigger, it uses target-based scaling. This means that it tries to maintain a 1:1 ratio between queue messages and function instances. If there are more messages than instances, it scales out. If there are fewer messages than instances, it scales in.
The unit of scale for Azure Functions is the function app. A function app is a logical grouping of one or more functions that share the same configuration and runtime settings. When the function app is scaled out, more resources are allocated to run multiple instances of the Azure Functions host. The host is the process that runs your functions and handles the communication with the triggers and bindings. Conversely, as compute demand is reduced, the scale controller removes function host instances.
The number of instances is determined by the number of events that trigger your functions, as well as the plan that you choose for your function app. There are three basic hosting plans available for Azure Functions: Consumption plan, Premium plan, and Dedicated (App Service) plan.
- Consumption plan: In this plan, you only pay for the compute resources you use when your functions are running. The scale controller adds or removes instances of the host dynamically based on the number of incoming events. Each instance of the host in the Consumption plan is limited, typically to 1.5 GB of memory and one CPU.
- Premium plan: In this plan, you pay for a minimum number of always-warm instances that can run your functions with no delay after being idle. The scale controller can also add or remove instances based on demand, but with more powerful instances and more options for CPU and memory. You can also connect your function app to a virtual network and use custom Linux images in this plan.
- Dedicated plan: In this plan, you run your functions within an App Service plan at regular App Service plan rates. You have full control over the scaling settings and can use features such as autoscale rules and minimum and maximum instance counts. You can also use App Service Environment (ASE) for a fully isolated and dedicated environment.
What factors affect the scaling behavior
Scaling can vary based on several factors, and apps scale differently based on the triggers and language selected. There are a few intricacies of scaling behaviors to be aware of:
- Maximum instances: A single function app only scales out to a maximum allowed by the plan. A single instance may process more than one message or request at a time though, so there isn’t a set limit on number of concurrent executions. You can specify a lower maximum to throttle scale as required.
- New instance rate: For HTTP triggers, new instances are allocated, at most, once per second. For other triggers, new instances are allocated at most once every 30 seconds.
- Scale-in delay: The scale controller waits for 10 minutes before scaling in an idle instance. This helps avoid frequent scale-in and scale-out operations.
- Cold start: After your function app has been idle for a number of minutes, the platform may scale the number of instances on which your app runs down to zero. The next request has the added latency of scaling from zero to one. This latency is referred to as a cold start. The number of dependencies required by your function app can affect the cold start time. Cold start is more of an issue for synchronous operations, such as HTTP triggers that must return a response. If cold starts are impacting your functions, consider running in a Premium plan or in a Dedicated plan with the Always on setting enabled.