Deploying Cassandra cluster to AWS using Terraform and AWS Lambda

Intro

Hello everyone. It’s been a few weeks since I received a task to investigate Cassandra on AWS deployment strategies. For me, the best way to learn something is to try it myself, so after some local experiments I started to deploy all the stuff on AWS. This is my first article, so its likely that it won’t surpass your expectations. I was completely new to Cassandra and had basic experience with AWS, so consider this article as material, written by a newbie and for newbies, but it still assumes you have some basic theoretical knowledge about Cassandra and AWS. I acknowledge that most people, who are learning, would prefer instructions from more experienced people than me (I would do so), but if for some reasons you are interested — you are welcome.

General infrastructure overview

This article will lead you through some basic issues about deploying Cassandra on AWS in one region and multiple availability zones. The whole infrastructure is described as code using Terraform, because I am a big fan of Hashicorp tools and especially Terraform. The general infrastructure overview is shown in the picture below:

General inftastructure overview

We have VPC in some region with 3 availability zones. Each availability zone has two subnets: private and public. Private subnet is the one where Cassandra instances live, and public is needed for NAT gateways to live in. NAT gateways provide egress-only access for database instances (for updating software, etc). Bastion instance also lives there. Bastion is the only way for getting access to DB instances from public network. In summary, there are 3 private subnets with /24 mask and 3 private subnets with /28 mask. Also, we have 4 route tables — one for each private subnet and one for 3 public subnets.

From the security aspect, we have three security groups:

Private security group allows ingress access for Cassandra services: 7000 port is needed for Cassandra internode cluster communication. 7001 port stands for Cassandra Secure Socket Layer (SSL) internode cluster communication. In my case this was not implemented, but I left it open for further usage. 7199 port — Cassandra JMX monitoring port. Ports 9042 and 9160 are Cassandra client ports. Traffic to all these ports is allowed from any of the three private subnets. Additionally, port 9160 is allowed for access from public subnets to enable health-checks from the load balancer. Also, Idefined ingress rule for 22 port from public subnets, which allows you to SSH into instances from your bastion instance. All egress traffic for this security group is allowed. More info in pg. 37 of this doc. Public security group allows ssh ingress traffic from anywhere. All egress traffic is allowed too. Load balancer security group (not shown in the picture). It allows egress and ingress traffic only for 9160 port.

For some additional security you can also configure ACLs, but it it beyond the scope of this article.

Some pitfalls I’ve faced on the way to make this network infrastructure work: NAT gateways must be deployed in public subnets (it was a bit confusing in the beginning), a route to proper NAT GW must be added to every private subnet’s route table and a route to IGW must be added to public subnet’s route table. All terraform sources are available here.

After the network infrastructure is configured, let’s move to Cassandra instances.

Cassandra cluster deployment

Cassandra is known as highly available, scalable and advanced performance distributed NoSQL database with no single point of failure. This means that if some nodes go down, cluster can still operate normally. Let’s look at this in more detail.

The Cassandra cluster consist of nodes. All nodes are peers. Generally, they are the same, but have slight differences. They are divided into seed nodes and non-seed nodes. Seed nodes are nodes, which addresses are announced to new nodes in cluster. The difference between them is that seeds have their own addresses in the “seeds” section of cassandra.yaml file, and non-seed nodes only have seed addresses there, but not their own. Seeds perform discovery of new nodes and attach them to cluster by providing information about another nodes and their states. Therefore, if there are no seed nodes alive in the cluster, new members cannot join. You may be wondering, why not to make all nodes seeds? The reason to not making all nodes seeds is that seed nodes get updates about the cluster first, and with in case all-seed node cluster we have random update propagation. This results in larger time of convergence and reduced Gossip performance (details here). Datastax recommendation is to have 3 seed nodes per datacenter. In my case, I used 1 seed node per datacenter, just to save money on my learning process. In real-life cases I recommend to follow Datastax and have more than 3 seed nodes per availability zone.

Some technical details: I used my own pre-build AMI for launching instances. For creation of AMI I used another Hashicorp tool, Packer. Sources are available in my github repository. For provision I used Ansible, which follows this instruction and installs Cassandra, but doesn’t start the service. Upon the creation of instances user data is used to configure the cluster and start the service (source insructions). As instance type i used t2.micro (just for learning purpose), but for real-life deployment, according to GumGum presentation, it’s recommended to use r3.2xlarge type of instance with RAID data storage. I2-series instances may be more powerful in performance, but they are expensive. Also, consider enabling enhanced networking.

For non-seed nodes I used multi-AZ autoscaling group. Instances are launched from the same AMI, but with slightly different user data. For bootstrapping a new cluster the auto_bootstrap property in cassandra.yaml is set to “false”. Seeds are launched with that config. For non-seed instances this property is set to default value, which is true.

As mentioned before, non-seed instances are launched in an auto-scaling group. In my case, I set the min and max values to the same number to keep the cluster consistent, but it is possible to make it more flexible by setting different values. This is another topic to investigate, because on the one hand we can save some money by scaling cluster up and down, on the other hand scaling down will destroy node without any preparation, meaning that all replicas on that node will be deleted. It will trigger the cluster rebuild to find another nodes to place replicas, according to the replica factor, and may affect performance. This case is beyond the scope of this article, so it’s up to you.

To keep the cluster optimized, you have to keep these nodes consistent, highly-available and their addresses must be the same. That’s where Elastic Network Interface comes into play.

Using Lambda function to provide fault-tolerance

ENIs allow us to have a predictable address for seed instances. The problem here is that you need to keep those nodes up, simply saying you need fault-tolerance for that instance. Usually, if we need failover we use an autoscaling group, but we can’t launch instances in autoscaling group with ENIs. So, I decided to use Lambda function, which is triggered by CloudWatch event:

That is how my first experience with JavaScript happened. I needed a function, which would detect stopped or terminated instances and recreates them. In case of stopping an instance the scenario is pretty simple — we just need to start it again. The termination scenario is more complicated — we need to start the instance with regular parameters like instance type, AMI, keys, etc. Here I faced the problem of passing user_data to my Lambda function. User data is actually a script which modifies the cassandra configuration file. Usually I passed some variables from terraform to Lambda as environment variables, but I don’t think that passing bash script as environment variable is the right way, so I passed it like a long and unpleasant string with a lot of backslashes. If you know any other ways to pass the user data from terraform into Lambda function, please let me know. The instance is also launched with available ENI (to keep address) and tag. You can use tags to locate Cassandra resources among other resources in AWS, in case you have a shared subscription. After this, the created instance has to be attached to the load balancer and added to the CloudWatch event. Logic diagram of that function is shown here:

Logic diagram of Lambda function

To make this work you also need to give the Lambda function access to certain resources by giving it proper IAM role. You can find it in my terraform script for IAM Lambda role.

Thus, we have a highly-available fault-tolerant Cassandra cluster on AWS. Frankly speaking, that kind of automation (I mean the Lambda function) may be redundant. It can’t cover all the disaster scenarios and still requires manual intervention, but can be implemented as an emergency problem solution to save your sleep at night.

Some useful information about keystones

Some technical intro: Cassandra supports network topology recognition for more efficient data distribution and fault-tolerance. Each node can know in which datacenter and rack is it launched. The way of this work is determined by Snitch. In my case I used EC2Snitch. It treats a region as datacenter and availability zones as racks. For more datacenters you can use cassandra-rackdc.properties file, but in my case i have one datacenter.

So, to create a durable keystone you need to choose proper replication strategy. It is recommended to use NetworkTopologyStrategy as it places the replicas in different racks and datacenters. To determine how many replicas you need to consider between being able to satisfy reads locally, without incurring cross data-center latency, and failure scenarios, as described in the documentation. Keyspace creation with all possibilities is described well on this documentation page, and here is the point where database operation starts, which is beyond the scope of this article.

That is basically all i wanted to share with you about the theoretical knowledge. The next topic will cover some practical instructions in case you want to try it yourself.

Deploy Cassandra cluster in a few steps

Here is the instruction that allows you to deploy operational Cassandra cluster just in few steps. First of all, consider that launching this could cost you money! If you are OK with it, you have to meet some pre-reqirements: you need Terraform, Packer and Ansible to be installed on your local system and have access to AWS. Also, you need to create ssh key pair on AWS. So, lets begin:

Clone source codes from this git repository to some dir; Set environment variables for Terraform and Packer — AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION. Build custom image with packer. You have to provide it with subnet_id, vpc_id and source_ami of AWS Linux AMI for your region.

IMPORTANT NOTE: your subnet MUST be public and accessible by Ansible. In addition, on some Linux distributions like CentOS 7, you should rename packer binary because of another default tool, which is named packer.

When all variables in packer .json file are filled, launch packer with

packer build <filename>

command in “packer” directory. After it successfully finishes the building process, you should see end of output similar to this:

==> Builds finished. The artifacts of successful builds are:--> amazon-ebs: AMIs were created:

us-east-2: ami-1234567

Copy the AMI code to your clipboard.

4. Provide terraform with created AMI and your ssh key pair(edit the /terraform/vars.tf file, cassandra_ami variable). You should also change the bastion_ami according to your region — it could be any Linux instance, I used RHEL. Feel free to change the other variables (tags, instance types, availability zones, etc).

Notice that not every region and AZ support all resources, which are used in this demo, so for some regions you will have to modify scripts or do some additional steps to make it work. This was tested in us-west-2 and us-east-2 regions.

5. Launch terraform with

terraform init && terraform apply

commands in the “terraform” directory. Wait a few minutes until the job is done. In the end you will get an output with public DNS name of load balancer. By using it you can access your cassandra cluster with the default credentials (cassandra, cassandra). You can check if it is working from your local Cassandra instance by typing:

cqlsh <load_balancer_DNS> -u cassandra -p cassandra

You should see cqlsh> promt. Cassandra single-node installation instruction is available here. From here you can create keystones, users, etc.Your next really important step is to change the default superuser credentials. Alter the cassandra user and set new a password, preferably something long and incomprehensible:

ALTER USER cassandra WITH PASSWORD '#@)₴?$0R3@11y$tR0nGP4$$W0rd4';

Also, consider making system_auth keyspace to be redundant by altering the keyspace and changing the replication parameters:

ALTER KEYSPACE system_auth WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'DATACENTER_NAME':N };

where datacenter name is your region, N is number of replicas, for complete redundancy could be set to the number of all nodes in cluster.

If you want to check your cluster with nodetool, you should login to the instance with ssh. You can do it only from a bastion instance. I recommend to use ssh-agent to forward keys. When logged in to any of your Cassandra instances, run command

sudo nodetool status

to display information about the cluster. You should see output similar to this:

Datacenter: us-east-2 ===================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.1.148 81.42 KB 256 16.1% a438trh2... 2a UN 10.0.3.176 66.04 KB 256 16.4% 5cc53d13... 2c UN 10.0.1.50 91.2 KB 256 15.0% a4385672... 2a UN 10.0.2.97 109.21 KB 256 17.0% 3b4fde4d... 2b UN 10.0.3.50 86.76 KB 256 17.2% 8bd4e613... 2c UN 10.0.2.50 73.8 KB 256 18.3% 2260e99a... 2b

Conclusion

So, we have an operational Cassandra cluster on AWS. Congrats, you survived til the end. Hope somebody finds this information useful. If you have any feedback please let me know.