Over the past year, I’ve spent a lot of time working with Cloud Bigtable on Google Cloud Platform (GCP.) Lately, I’ve wanted to get a better understanding of similar big data offerings on the market specifically HBase and Cassandra.

I‘d never set up anything like Cassandra before and was intimidated by all the tutorial videos I watched, but it proved to be a very manageable task using GCP’s click to deploys.

Deploying

On your GCP project with billing enabled, you can navigate to the Cloud Marketplace and search for “Cassandra.”

There are a few options that come up, but I chose the Cassandra (Google Click to Deploy) version that runs on Google Compute Engine.

Click “Launch on Compute Engine” and you’ll be brought to a deployment configuration form. You can leave all the defaults or modify them if you are more familiar with Cassandra. The only thing I changed was enabling Stackdriver logging and monitoring in case I would need them for debugging. Once you’re happy with the configuration, submit the form to deploy your Cassandra cluster!

Connecting

After a few minutes your cluster will be deployed and you can connect.

Successfully deployed Cassandra cluster

Click the SSH button on the right hand pane to open a shell that will connect to your cluster. In the shell you can access the Cassandra command-line client by typing cqlsh . You can create a keyspace, table, add some data and query it all through that shell. I followed this Hello World and found it pretty helpful.

I also wanted to connect to my Cassandra cluster with one of the client drivers to see how I would use it in a real application. You’ll need to create a new firewall rule that allows access for TCP over port 9042, so the client can talk to the cluster. You can use this gcloud command:

gcloud compute firewall-rules create cassandra-client --allow tcp:9042

or create the firewall in the GCP console under VPC network.

List of firewall rules under VPC network

Add a firewall rule that targets all instances in the network, filters 0.0.0.0/0 as the range of IP addresses, and allows TCP:9042. If you’re using this for production purposes, you should specify a more controlled firewall rule.

Now that your firewall is set up, you can connect with the client of your choice. I decided to use the Datastax Java Driver, and performed a quick query to see if my connection succeeded. I used the external IP of the first VM (which can be found under Compute Engine in VM Instances) and the region (the zone without the letter suffix) as the name of the local datacenter.

And I was happy to see that my query worked!

For more information, check out the Apache documentation or the Datastax documentation.

Thanks to Daniel Bergqvist (@bexie), Kristen O’Leary, and Robert Kubis (@hostirosti).