If you are looking to transition some or all of your IT infrastructure to the cloud, one advantage that may help persuade people to get on board is the ability to use autoscaling. With autoscaling, you can provide a real boost to your website or web app by making the amount of resources it uses responsive to what actual usage is at any given time.

Autoscaling is a powerful feature found in many cloud computing services which allow resources to be added or removed based on the load that occurs at a particular time. Basically, resources such as additional servers can be set up, much like in load balancing, but the number of them being used at any given time is based on the number of requests being received at that time.

The term autoscaling was originated by Amazon Web Services (AWS); however, many other cloud services now offer this feature as well, so you have quite a few options to choose from in order to best fit the needs of your company.

How Autoscaling Works

Autoscaling can be beneficial both for handling traffic spikes and for the bottom line. With autoscaling, additional resources are only put in place when needed, so that you do not have to, for instance, pay to have a number of extra servers running all the time in order to handle the possibility of a heavy load at some point. Instead, you simply pay for them when they are needed, which can potentially save you quite a bit of money!

Many services set up autoscaling so that you can have a minimum and a maximum number of servers that are allowed to be running to handle requests. For example, you could set a minimum of one and a maximum of eight. If the number of requests is minimal, then only one server will be running. If load becomes very heavy, then all eight servers can be running in order to handle it. Once traffic decreases again, some of the servers can be shut down.

A fixed cycle is a related technique that uses a static schedule to turn on or off resources for expected traffic patterns. The downside to this technique, though, is that it may not always account for everything.

For instance, an unexpected amount of traffic could arrive in the middle of the night simply because more people happened to be up late surfing the web that night. This unexpected traffic pattern could cause downtime, as the fixed cycle had only a minimum number of servers running and wasn’t able to handle the load. With autoscaling, the surprise usage would be handled automatically, thus helping to avoid the downtime.

How Netflix Handles Autoscaling

Netflix

IN 2013, Netflix published a report showing how they were able to make use of two forms of autoscaling: one provided by AWS, and one they customized to work with AWS for a few of their specific use cases. The main goal for Netflix has been to always have a scalable system that has minimal outages, and to be able to quickly respond should any outages occur.

The first bit of technology they used to meet their needs was Amazon Auto Scaling (AAS) provided by AWS. Regarding AAS, Netflix had great praise for the effectiveness of the autoscaling feature.

Source: Netflix

Netflix also went a step further by adding a customized autoscaling engine to handle some of their specific use cases, such as a rapid spike in demand, outages (which they noted were often quickly followed by a “retry storm”), and variable traffic patterns. They noted that both scaling up aggressively and always having more than the required servers running were both not cost-optimal solutions.

As a result, they built what became known as Scryer, a predictive autoscaling engine that is able to help predict the resource needs based on daily traffic patterns. By adding Scryer to the mix, they noted some additional benefits they received, including, better service availability and a reduction in EC2 costs.

Scryer in action. Source: Netflix

In the end, Netflix was able to create a hybrid of predictive autoscaling through Scryer and reactive autoscaling with AAS, and felt that the combination really help provide a robust solution for them.

How Facebook Handles Autoscaling

In August 2014, Facebook published a post from one of their engineers which described how they use autoscaling to significantly lower their energy costs.

Autoscale led to a 27% power savings around midnight (and, as expected, the power saving was 0% around peak hours). The average power saving over a 24-hour cycle is about 10-15% for different web clusters.

Source: Facebook

Facebook had a goal of remaining energy-efficient and keeping a minimal environmental impact as they continued to scale. Since their servers were handling billions of requests, they were already using a modified round-robin system for load balancing. While this was helpful, they felt they could save more energy by adding some autoscaling features into the mix.

To accomplish their goal, they implemented an autoscaling solution that was able to push workload to a server until it was taking on a medium workload, and would also use a minimal number of servers when the workload is low (in their case, near midnight). This resulted in a great deal of savings when compared to a typical cluster.

The results of autoscaling on power consumption for Facebook. Source: Facebook

As you can see, their power consumption for their autoscaled cluster was quite a bit less than for their base cluster, especially at non-peak hours. As a result, autoscaling helped them to both save money and to minimize their environmental impact when scaling.

Morpheus autoscaling tutorial

Autoscaling in Morpheus is just a few simple clicks. Morpheus also allows you to easily scales instances based on CPU, ram, I/O, or custom schedules. Here’s a quick autoscaling tutorial in Morpheus.

Start Autoscaling with Morpheus