Lightly used applications do not need to be running 100% of the time and you shouldn’t be paying for it.

If you deploy applications on GAE you will start to notice that you are spending money for two instances instead of just one when your application *is* running. This how-to will walk through scaling back to a single instance for applications in flexible environments that do not need to be up and running for 100% of the time or have strict reliability/latency constraints.

Environment: Standard vs. Flexible

Before we look at how services are configured or even the cost data we need to make sure we know the difference between the two environment types offered by GAE. The documentation here outlines the differences. There are several key differences related to cost.

Flexible: In a flexible environment you are paying for usage of vCPU, memory, and persistent disks. It is generally more cost effective if you have regular traffic patterns that require scaling up and down gradually.

Standard: In a standard environment you are paying for only what you need (e.g. instance hours) and can scale to 0 instances when there is no traffic. It is generally more cost effective for small applications that do not take traffic all of the time.

With this understanding in mind, you can take a look at your applications’ app.yaml files to see which environment they are configured to use. If there is no env key specified, it is deployed to the standard environment.

Scaling Down Flexible Environments

If you need to run a Flexible environment for technical reasons or choices such as the ability to run Node.js or Ruby, or because you require SSH debugging (do you, really?) or the ability to have Background processes, you may still be able to configure the scaling parameters of your service.

If you have a small application that does not have any scaling or resource parameters specified that is a good place to start.

The first parameter that we will look at tuning is the automatic_scaling parameter. Since our example has a runtime of nodejs, we will continue with that.

There are two documents to review: 1) An Overview of App Engine, and 2) Configuring your App with app.yaml. You may already have these pages open or have read them previously but they are good references.

If your application does not require redundancy or high availability, you can actually scale it down to a single instance. By default GAE will deploy 2 instances for latency and redundancy/reliability purposes.

To be able to understand how you can set the parameters such as max_num_instances (from the second document above) you will need to review the instance metrics for your application with the understanding that if you scale below 2 instances (the default) you will take a hit on latency and redundancy/reliability.

In the Google Cloud Platform console, under App Engine, under Instances, select your service. In the drop-down below your service, you will be able to select several metrics to review:

Instance Metrics

Set the time period to something a bit longer, maybe 14 or 30 days and browse through the metrics.

In my example application that I will likely be scaling down to a single instance (since it doesn’t require low latency or strict reliability) I took a look at the Summary, Traffic, VM Traffic, CPU Utilization, Instances, Memory Usage, and Disk bytes.