Configuring Google Cloud CDN with Terraform

How we’re supercharging our web applications.

Introduction

The performance of web applications has increased immensely in recent years. Users now expect the web applications they interact with to load quickly and work smoothly.

When a customer interacts with a web application, static content (like images and CSS) has to be served from a physical location and travel to the user’s device. One could logically assume that the distance and number of network hops to that location correlate with the latency of content delivery and consequently user experience. While we can’t influence users’ network connectivity, we can control how and where we serve the content from.

With Cognite’s recent expansion from Norway to the North American and Japanese markets, we expect to see more and more traffic from those areas of the world. To ensure a good web application experience for all of our users, we have to serve content from as close to them as possible.

Application serving infrastructure at Cognite

We serve content through a reverse proxy web application server deployed on Kubernetes (K8S) that is backed by a Google Cloud Storage origin server. Until recently, this included all incoming requests, including static requests for resources.

Typical application serving request flow. Source: Google.

Web application servers are great for business logic, but they don’t excel at serving static content. Instead, to offload some of the heavy lifting, we wanted our static content to be backed by a fast and reliable global Content Delivery Network (CDN). A CDN is a geographically distributed set of servers that work together to accelerate content delivery from a (edge) location closer to the end users.

Routing static content requests through a global CDN. Source: Google.

At Cognite we are strong believers in Infrastructure as Code. We are heavy users of Terraform, an open-source tool for building, changing, and versioning infrastructure safely and efficiently. We use it to manage infrastructure for our multicloud deployments in a predictable and reproducible manner.

Provisioning a CDN with Terraform was not an exception. We looked into Google Cloud CDN, which according to CDNPerf is the fastest CDN in the world. We had hoped that Google’s official Terraform provider would come with a “cloud_cdn” resource type, but unfortunately that was not the case. To enable Cloud CDN using Terraform, we had to look under the hood to understand how it works and what its building blocks are.

How does Google Cloud CDN work under the hood?

According to the official documentation, Cloud CDN works with HTTP(S) load balancing to deliver content. The HTTP(S) load balancer provides the frontend IP addresses, porting and mapping them to backends that respond to requests. Cloud CDN can source content from various types of backends, which are also called origin servers.

Distributing traffic to various backends with HTTP(S) Load Balancing. Source: Google.

Google provides an easy-to-use interface in the console to set up a Cloud CDN with a few clicks, but we wanted to replicate this setup across multiple projects using Terraform. We used Google’s load balancing diagram, seen in the picture below, to understand how HTTP(S) load balancing translated into Terraform resource types available in the Terraform provider.

Content-based and cross-regional load balancing diagram. Source: Google.

1. Prerequisites

First off, we have to set up the Terraform providers. Note that we will be using the “google-beta” provider to provision a Google-managed SSL certificate.

2. Backend service

At its core, a HTTP(S) load balancer requires some sort of backend to serve requests. In the CDN’s case, we will use a backend bucket with a storage bucket as the origin server for sourcing our content. The key part here is enabling CDN by setting the “enable_cdn=true” attribute on the backend bucket.

Note: We are using a multiregional storage class, which comes with slightly greater availability guarantees and faster content delivery.

3. Routing

Now that the backend is ready to serve our requests, we have to route some traffic to it. We need to create a URL map and link it to the backend bucket created in the previous step. Note that, once created, the URL map can be found in the “Load balancing” tab of your “Network services” inside the Google Cloud Console.

The URL map by itself does not actually route requests to the backend service. As the name indicates, it’s just a URL configuration map. It is however used by a HTTP(S) Proxy to determine which backend service to use. To serve content securely over HTTPs, the SSL certificate has to be associated with the HTTP(S) Proxy. To make things simple we can use a Google-managed certificate.

Note: The “google_compute_managed_ssl_certificate” resource type is still in beta. Make sure you understand the certificate creation lifecycle and the effects it might have on your system.

4. Frontend

The routing is configured, but we still have to expose our content to the internet. We can use forwarding rules to direct incoming requests to a HTTP(S) proxy. At this point we will also create a global IP address on behalf of which the forwarding rule is serving. This is needed in order to associate a DNS record with the IP address in the following steps.

Note: There are two tiers of HTTP(S) load balancing: global and regional. We need to make sure to use the global configuration, as this ensures that we’re taking advantage of Google’s Premium network and all of its 140 edge locations. (You can read more about how Google’s Network Service Tiers compare here.)

5. DNS

We’re almost done! The HTTP(S) load balancing is now in place, but content is only accessible through the global IP address created above. It would be great if we could access it through a more user friendly URL address. We want to avoid referencing our CDN via a hard-coded IP address or share it with our users. Let’s fix that by creating a DNS “A” record set.

6. Permissions

If you carefully followed all the previous steps, you probably wonder if things work at this point. A careful reader would notice that we are missing the last but very important piece: permissions. If you tried to access the CDN (after the DNS has successfully propagated), you would see a familiar “AccessDenied” error screen propagated from the storage bucket. This is a good indication that our requests are routed correctly and are reaching the storage bucket.

Google Cloud Storage AccessDenied error

To make the storage bucket publicly accessible use the “allUsers” IAM membership. It is important that we only enable single-object reader access without list object permissions. We can do this by associating our membership with the “storage.legacyObjectReader” role.

That’s it!

Tip: Now that we have a working CDN, you should consider wrapping it into a reusable Terraform module. This can help you replicate resources across multiple environments.

Conclusion

A CDN provides a reliable way to serve content to your users as quickly as possible, reducing I/O costs and content delivery latencies. This performance boost can positively impact your customer conversion and retention rates.

In our case, we have observed that up to 60% of our inbound requests are hitting the Cloud CDN cache, avoiding an expensive trip to the origin server. Moreover, we have seen significant improvements (up to 50% in some cases) in First Contentful Paint (FCP) across our applications, which can affect whether a user perceives them as “fast” or “slow.” And the best thing? It doesn’t matter where you’re accessing our application from. All users across the globe will get these improvements for “free.”

There are many excellent CDN providers on the market. Being a Google shop is not a requirement. The fierce competition will hopefully push providers to keep improving their CDN offerings in the future.