While building a backend that is capable of scaling to handle large scale of data events, load balancing is the first thing that often crosses the minds of developers.

While it allows you to split the incoming traffic to different instances of your server; implementing a load balancer is often time consuming and might require rigorous testing to ensure that it works as expected.

Thankfully Google Cloud allows developers to use App Engine, which is a Managed Version of this same infrastructure that eliminates all the hassle and troubles associated with setting this up in house.

With a guaranteed monthly uptime guarantee of 99.00% - 99.95% and pay as you use pricing model which means you only pay for the CPU and RAM that your app utilise makes App Engine an enticing choice as a Backend for Cloud Developers.

A key feature of App Engine is it’s unique autoscaling algorithm which increases or decreases the number of active instances based on the CPU load, network traffic and certain other factors.

However, once you start using App Engine with its default settings, you might notice that the throughput provided by it is mediocre at best.

A single instance of App Engine we deployed for our products at roobits.com was able to handle only 300 requests per second before upscaling to more instaces.

On contrast, testing the same setup on a single local machine with the same configuration was able to provide us with more than 900 requests per second.

On looking more into the cause for this issue, we stumbled upon the culprit, which was the default configuration specified in the app.yaml file; which contains specifications like the CPU cores, RAM, disk size, etc. for your App Engine instances.

By default, the app.yaml that you find on Google’s quick start guides only a few parameters as shown in the screenshot below :

While it’s good for testing a MVP idea, when you are builiding this thing to scale where every single $ spent matters, this solution might not be very efficient.

We encountered this and in this blog I’ll be sharing some tips which might help you increase the throughput provided by App Engine and thus saving some $$ in the meantime.

Thanks for reading this blog, if you are working at a high growth company and are looking for a large scale, real-time data collection platform; take a look at https://roobits.com/.

We might be what you are looking for!

Tweaking the autoscaling parameters :

App Engine comes with option to modify the autoscaling algorithm in it’s app.yaml file.

By default, the yaml only specifies the number of minimum and maximum instances that the App Engine should scale to.

While that is a good thing, there’s a third parameter called target_utilization which is not included by default and has a default value of 0.5

What this means is that as soon as your App Engine instance uses up 50% of its CPU, a new instance will spin up.

In our use-case, this was very aggressive so we increased the default value to 0.9 and ended up cutting the number of spawned instances from 14 to 9, which was essentially 40% decrease in the overall costs!

You can look at the app.yaml reference for more details on this tweak.

Starting with small instances :

The resources block in your app.yaml file allows you to specify the number of cores and RAM in your machine.

While you might be inclined to start with a machine with 4 cores and 4GB ram, don’t!

Instead, start with a lower machine (1 core CPU and 1 GB RAM) and increase the minimum instances that the App Engine should spawn instead.

This will ensure that your Instances don’t spawn more CPU and RAM than required.

For example, assume that a single instance with 1 core and 1 GB ram can handle a network load of 300 RPS.

So a machine with 4 cores and 4 GB ram will be able to handle 1200 RPS in theory.

If my app received a load of 1500 RPS, the former setup will be using 5 cores and 5GB of RAM whereas the latter setup will use 8 cores and 8 GB of RAM.

Since App Engine bills you on the number of Cores and RAM used per hour, you can save up to 40% of your costs by using the former setup with lower machines!

Less frequent health checks :

App Engine by default performs health checks by pinging your instances to ensure that they’re working well.

If an instance doesn’t return a 200 response to these health checks, that instance is deemed unhealthy and restarted.

While these are necessary to ensur that your instances are up and healthy, by default App Engine performs these health checks multiple times per second (in our use-case the AppEngine instances received as much as 20 checks per second)!

This could impact the requests that your App Engine instances can process per second thereby resulting in more instances being spawned.

While it’s strongly advices against disabling health checks, you can lower their frequency to ensure that they only check your instances once every 10–15 seconds.

You can look at the app.yaml reference guide for instructions on how to do this.

By reducing the frequence of health checks, we were able to reduce the number of spawned instances by 1, thereby saving us the cost of 1 instance every month!

And that’s it!

Just by tweaking these 3 critical parameters we managed to cut down over 30–50% of our costs.

While the tips mentioned here are specific to our use-case over at https://roobits.com/, but I’m sure that you can find some of them useful for your own use-case as well!

Thanks for reading! If you enjoyed this story, please click the 👏 button and share to help others find it! Feel free to leave a comment 💬 below.

Have feedback? Let’s connect on Twitter.