Amazon Aurora provides endpoints for the primary DB instance (the cluster endpoint) and for Read Replicas (the reader endpoint). Aurora updates the cluster endpoint automatically so that it always points to the primary instance. The reader endpoint load balances read operations across all available Read Replicas.

Amazon Aurora Replicas typically have less than 100 ms of replication lag. So if your application can tolerate that latency, it can use both the cluster and reader endpoints to take advantage of a horizontally scaled database (Figure 1).

However, managing two database endpoints, one for reads and one for writes, adds complexity to the application. In this post, I show how to use pgpool to provide a single PostgreSQL-compatible Aurora endpoint that automatically directs write traffic to the cluster endpoint and read traffic to the reader endpoint (Figure 2).

Architecture

Pgpool is a BSD-licensed middleware that sits between a PostgreSQL database and the database clients. In this example, we deploy it using the architecture in Figure 3.

The Amazon Aurora cluster consists of one primary instance and two Aurora Read Replicas that are deployed across two Availability Zones and two private subnets. The cluster has a security group that allows ingress only from the pgpool instances.

Pgpool is deployed in an Auto Scaling group with a single active instance for failover purposes. It is also isolated in private subnets with a security group that permits access only from allowed Classless Inter-Domain Routing (CIDR) blocks.

The access subnets host a Network Load Balancer that provides a consistent endpoint for pgpool. So your database endpoint doesn’t change if the pgpool instance fails and the Auto Scaling group creates a new one.

Deploying with AWS CloudFormation

The AWS CloudFormation templates that are used in this example are located in this GitHub repository. The example uses several nested AWS CloudFormation templates to deploy the virtual private cloud (VPC) infrastructure, the security groups, the Aurora cluster, and the pgpool middleware. Using nested stacks lets you break up a large stack into several reusable components. If you aren’t familiar with AWS CloudFormation, review the AWS CloudFormation documentation.

For complete instructions on deploying the templates, see the README file on GitHub. The following sections contain some of the highlights.

Creating the Amazon Aurora cluster

The following AWS CloudFormation snippets show how to create a three-node Amazon Aurora cluster when the VPC infrastructure and security groups are ready. Amazon Aurora has one of the three nodes as the primary and the other two as Read Replicas.

DBAuroraCluster: Type: "AWS::RDS::DBCluster" Properties: DatabaseName: !Ref DatabaseName Engine: aurora-postgresql MasterUsername: !Ref DatabaseUser MasterUserPassword: !Ref DatabasePassword VpcSecurityGroupIds: - !Ref DBFirewall Tags: - Key: Project Value: !Ref ProjectTag DBAuroraOne: Type : "AWS::RDS::DBInstance" Properties: DBClusterIdentifier: !Ref DBAuroraCluster Engine: aurora-postgresql DBInstanceClass: !Ref DbInstanceSize Tags: - Key: Project Value: !Ref ProjectTag DBAuroraTwo: Type : "AWS::RDS::DBInstance" Properties: DBClusterIdentifier: !Ref DBAuroraCluster Engine: aurora-postgresql DBInstanceClass: !Ref DbInstanceSize Tags: - Key: Project Value: !Ref ProjectTag DBAuroraThree: Type : "AWS::RDS::DBInstance" Properties: DBClusterIdentifier: !Ref DBAuroraCluster Engine: aurora-postgresql DBInstanceClass: !Ref DbInstanceSize Tags: - Key: Project Value: !Ref ProjectTag

Deploying pgpool

The blog post Using pgpool and Amazon ElastiCache for query caching with Amazon Redshift is a good reference for deploying pgpool on AWS. In addition, the pgpool documentation contains a section that is specifically for Amazon Aurora.

In the AWS CloudFormation template that deploys pgpool, you’ll set up an ELB load balancer and an Auto Scaling group. The launch configuration for the Auto Scaling group deploys and configures pgpool using the AWS CloudFormation cfn-init tool.

First, you need to install a few packages and unpack the pgpool installation file:

yum groupinstall -y "Development Tools" && yum install -y postgresql-devel wget www.pgpool.net/download.php?f=pgpool-II-3.7.2.tar.gz -O /tmp/pgpool-II-3.7.2.tar.gz tar zxf /tmp/pgpool-II-3.7.2.tar.gz

Next, build pgpool, and create the log and PID directories:

cd /opt/pgpool-II-3.7.2 ./configure && make && make install mkdir -p /var/run/pgpool && mkdir -p /var/log/pgpool && chmod -R 777 /var/run/pgpool && chmod -R 777 /var/log/pgpool

Amazon Aurora uses MD5 authentication, so you need to enter your master database user in a local authentication file.

/usr/local/bin/pg_md5 -m -u ${DatabaseUser} ${DatabasePassword}

You also need to configure MD5 authentication in /usr/local/etc/pool_hba.conf .

host all all 0.0.0.0/0 md5

Finally, register pgpool as a service, start it, and use chkconfig to start it automatically.

Configuring pgpool

Most of the pgpool configuration is done in /usr/local/etc/pgpool.conf . The following table shows the important settings.

Setting Value Notes listen_addresses ‘*’ Allow incoming connections on all interfaces. backend_hostname0 The Amazon Aurora cluster endpoint backend_port0 3306 Amazon Aurora in PostgreSQL mode uses port 3306. backend_flag0 ALWAYS_MASTER Don’t let pgpool try to fail over. backend_hostname1 The Amazon Aurora reader endpoint backend_port1 3306 Amazon Aurora in PostgreSQL mode uses port 3306. enable_pool_hba On Amazon Aurora requires this for authentication. pool_passwd ‘pool_passwd’ Set location of authentication file. Ssl On Amazon Aurora uses Secure Sockets Layer (SSL) connections. replication_mode Off load_balance_mode On master_slave_mode On master_slave_sub_mode Stream sr_check_period 0 health_check_* Configure with master account credentials. fail_over_on_backend_error Off

Testing the configuration

Now you can issue a few SQL statements and confirm that pgpool is directing traffic as expected. Start by creating a table and inserting a few rows:

CREATE TABLE distributors ( did integer PRIMARY KEY, name varchar(40) ); INSERT INTO distributors (did, name) VALUES (5, 'XYZ Widgets') INSERT INTO distributors (did, name) VALUES (6, 'ABC Widgets') INSERT INTO distributors (did, name) VALUES (7, 'QRS Widgets')

Checking in /var/log/pgpool.log , you can see that these statements are directed to the Amazon Aurora primary (node 0):

DB node id: 0 backend pid: 18703 statement: INSERT INTO distributors (did, name) VALUES (7, 'QRS Widgets')

Next, issue a query:

select * from distributors;

Checking in /var/log/pgpool.log , you can see that this statement is directed to the Amazon Aurora reader endpoint (node 1):

DB node id: 1 backend pid: 19793 statement: select * from distributors

As a final test, confirm that pgpool handles SELECT statements embedded in a transaction:

BEGIN; INSERT INTO distributors (did, name) VALUES (10, '10'); INSERT INTO distributors (did, name) VALUES (11, '11'); SELECT * from distributors where did > 9; COMMIT;

These statements should all hit the primary node, as the embedded SELECT is querying rows that are written in the transaction. You can again see from the log that things are working as expected.

DB node id: 0 backend pid: 22614 statement: INSERT INTO distributors (did, name) VALUES (10, '10') DB node id: 0 backend pid: 22614 statement: INSERT INTO distributors (did, name) VALUES (11, '11') DB node id: 0 backend pid: 22614 statement: SELECT * from distributors where did > 9

Of course, you should do a full regression test with your own database clients before using this configuration in production. If the scalability of the pgpool middleware concerns you, consider using a clustered pgpool instance where the Auto Scaling group allows multiple instances and scales based on load.

Strive for simplicity

In this post, we saw how to use pgpool to provide a single endpoint for an Amazon Aurora cluster that automatically directs read traffic to the reader endpoint. You can use this technique to simplify how your application handles connections to Amazon Aurora.

If you need more advanced routing options, such as the ability to route database connections based on custom logic, check out a solution called pgbouncer-rr. With custom logic, you can handle more advanced routing such as sending a small percentage of queries to a newer schema for canary testing.

In the future, keep an eye on Amazon Aurora Serverless, which scales the database backend automatically without requiring the configuration of replicas. This option is still in preview and does not yet support PostgreSQL compatibility.

About the Author

Randy DeFauw is a principal solutions architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.