Permissions boundaries are hard, especially with databases. You need them to be hidden away in private subnets, but want highly available access to them without hassle. Traditionally, you would use a bastion host (AKA Jump Server) in a public subnet to get access to your resources in private subnets, which works for its purpose. But managing these servers was cumbersome and annoying, as they lived with public DNS, and often used SSH for security enforcement which meant admins needed to manage ssh keys in a way that stayed up to date with their IAM policies. Enter AWS Session Manager, AKA SSM. This tool has been widely blogged about, as it gives access to servers through IAM Policies instead of SSH keys. From a quick search, I found these great resources: https://medium.com/@dnorth98/hello-aws-session-manager-farewell-ssh-7fdfa4134696 Says that RDS access is possible, but doesn't show how

https://www.reddit.com/r/aws/comments/df6uip/ssm_tunnelling_ec2_what_about_rds/ Gives a bash script to access RDS, but doesn't explain it or how to set up the necessary Infra

https://aws.amazon.com/blogs/aws/new-session-manager/ Introduces shell access, without much depth with examples

https://binx.io/blog/2019/02/02/how-to-login-to-ec2-instances-without-ssh/ Has CloudFormation templates, which is nice, but doesn't show RDS access

https://cloudonaut.io/goodbye-ssh-use-aws-session-manager-instead/ No talk of RDS

https://medium.com/tensult/use-aws-system-manager-bastion-free-ssh-key-free-access-to-ec2-instances-e6897c4143c5 No talk of RDS

https://github.com/elpy1/ssh-over-ssm Gives a CLI tool to get secure access to RDS, without much instructions on how to create infrastructure Even with these awesome resources, it wasn't immediately clear how to get started, especially with modern infrastructure management practices like terraform. And of the tools, like ssh-over-ssm , there is a significant prerequisite knowledge needed to make use of them. Just about everyone on the planet with RDS instances wants to access them from a local port, so the goal of this codelab is to explore how to get secure access from scratch. It will explore some older ways of getting access, which will hopefully help explain why the industry has moved the way we have to the current best practices approach of combining EC2 Instance Connect with SSM. At each step, I will create fully working example with basic security setups. The goal of this tutorial is to create near-production ready examples. Now for a haiku: Start with the basics Security can be hard It takes time to learn

Making a VPC To start with, let's whip up a quick RDS Server in a private subnet with no internet access. In a new terraform module, add the following code: module "vpc" { source = "terraform-aws-modules/vpc/aws" version = "~> 2.18.0" name = "codelab-vpc" cidr = "10.0.0.0/16" azs = ["eu-west-1a", "eu-west-1b"] # For the bastion host public_subnets = ["10.0.101.0/24", "10.0.102.0/24"] # For the RDS Instance database_subnets = ["10.0.1.0/24", "10.0.2.0/24"] # Allow private DNS enable_dns_hostnames = true enable_dns_support = true } This describes a VPC we want to put all of our resources from this codelab into. It creates two public subnets that we can put the bastion host into, and two private subnets that we can put a database in. It also enables private DNS so that our bastion server will be able to reach out to the RDS endpoint. To actually create these resources, let's run terraform init followed by terraform plan -out plan . If the plan looks good to you, you can apply it with a terraform apply plan . Making an RDS Instance in that VPC Now, let's create an RDS instance. In the same terraform file, add: module "db" { source = "terraform-aws-modules/rds/aws" version = "2.5.0" # Put the DB in a private subnet of the VPC created above vpc_security_group_ids = [module.db_security_group.this_security_group_id] create_db_subnet_group = false db_subnet_group_name = module.vpc.database_subnet_group # Make it postgres just as an example identifier = "codelab-db" name = "codelab_db" engine = "postgres" engine_version = "10.6" username = "codelab_user" password = "codelab_password" port = 5432 # Disable stuff we don't care about create_db_option_group = false create_db_parameter_group = false # Other random required variables that we don't care about in this codelab allocated_storage = 5 # GB instance_class = "db.t2.small" maintenance_window = "Tue:00:00-Tue:03:00" backup_window = "03:00-06:00" } module "db_security_group" { source = "terraform-aws-modules/security-group/aws" version = "3.1.0" name = "codelab-db-sg" vpc_id = module.vpc.vpc_id # Allow all incoming SSL traffic from the VPC ingress_cidr_blocks = [module.vpc.vpc_cidr_block] ingress_rules = ["postgresql-tcp"] # Allow all outgoing HTTP and HTTPS traffic for updates egress_cidr_blocks = ["0.0.0.0/0"] egress_rules = ["http-80-tcp", "https-443-tcp"] } You can use any other config you'd like for the RDS instance, my setup here was just meant to be as simple as possible with a small postgres instance. It is important to note that the security group is open to incoming TCP on port 5432 so that we can query it through the bastion hosts. Run another terraform init , then terraform plan -out plan . If the plan looks good, apply it with terraform apply plan . It can take up to 40m to provisision a new RDS instance (in my case it took 9m), so please be patient. We only have to do this once. Wrap Up Woohoo! You now have a database we can use to test bastion configurations with :) It is not accessible to the public internet, and has a strict security policy. Here's a haiku about RDS Instances in VPCs: Made a VPC Also made an RDS One in the other

Intro The RDS instance isn't super interesting yet, because it doesn't have any tables, data, or access set up. Because our security policy and DNS are so strict, we can't directly query it in any way yet. In this step, we will set up a standard bastion server that we can SSH to that will let us query the database. Terraform Changes In the same terraform file as before, add the following: module "ssh_key_pair" { source = "terraform-aws-modules/key-pair/aws" version = "0.2.0" # Feel free to change if you want to use a different public key public_key = file("~/.ssh/id_rsa.pub") key_name = "bastion_public_key" } module "bastion_security_group" { source = "terraform-aws-modules/security-group/aws" version = "3.1.0" name = "codelab-bastion-sg" vpc_id = module.vpc.vpc_id # Allow all incoming SSH traffic ingress_cidr_blocks = ["0.0.0.0/0"] ingress_rules = ["ssh-tcp"] # Allow all outgoing HTTP and HTTPS traffic, as well as communication to db egress_cidr_blocks = ["0.0.0.0/0"] egress_rules = ["http-80-tcp", "https-443-tcp", "postgresql-tcp"] } module "bastion" { source = "terraform-aws-modules/ec2-instance/aws" version = "2.12.0" # Ubuntu 18.04 LTS AMI ami = "ami-035966e8adab4aaad" name = "codelab-bastion" associate_public_ip_address = true instance_type = "t2.small" key_name = module.ssh_key_pair.this_key_pair_key_name vpc_security_group_ids = [module.bastion_security_group.this_security_group_id] subnet_ids = module.vpc.public_subnets } ########### # Outputs # ########### output "bastion_ip" { value = module.bastion.public_ip[0] } output "rds_endpoint" { value = module.db.this_db_instance_endpoint } This creates an Ubuntu 18.04 EC2 server in a public subnet of the VPC we created earlier. It's security group is open to incoming SSH and to outgoing communications with the RDS instance. It is set up so that your local ~/.ssh/id_rsa ssh key can be used to authenticate to the instance. After the outputs of running terraform init and terraform plan -out plan look good, run a terraform apply to create the resources. Once completed, we're ready to ssh onto the instance! Establishing SSH Tunnel To begin, run the command: ssh -L 5432:`terraform output rds_endpoint` -Nf ubuntu@`terraform output bastion_ip` Breaking this down, it says: ssh ubuntu@ terraform output bastion_ip : Let's ssh to the bastion server we just created

: Let's ssh to the bastion server we just created -L 5432: terraform output rds_endpoint : Forward the remote database socket to local port 5432

: Forward the remote database socket to local port 5432 -f sends the ssh command execution to a background process so the tunnel stays open after the command completes

sends the ssh command execution to a background process so the tunnel stays open after the command completes -N says not to execute anything remotely. As we are just port forwarding, this is fine. Once complete, our tunnel has been established. Now let's create some test data! Creating Test Data We're going to add some test data to the RDS instance here just to make it easy to validate that we can still access the data later once we use more complicated setups. But it should be noted that this is entirely optional. Install psql and postgres any way you know how, such as sudo apt-get -y install postgresql postgresql-contrib on ubuntu. Then, let's create a table: psql -d codelab_db -p 5432 \ -h localhost \ -U codelab_user \ -c "CREATE TABLE codelab_table (Name varchar(255))" and add some data: psql -d codelab_db -p 5432 \ -h localhost \ -U codelab_user \ -c "INSERT INTO codelab_table (Name) VALUES ('codelab_data')" If you're prompted for a password, use the password we set in terraform earlier, codelab_password . Finally, let's close the tunnel with kill $(lsof -t -i :5432) , which says to kill the process that controls local port 5432 . Haiku Do you hike? Cuz I want to haiku: We did it. We're in! Let the bastions grind you down? Nah, I conquer them.

Cool AWS Services Pre September 2018, what we have now would be considered standard, even cutting edge to those adopting cloud services. But in that September, AWS Systems Manager Session Manager (SSM) was announced, which helps to provide secure, access-controlled, and audited EC2 management. Source: https://aws.amazon.com/about-aws/whats-new/2018/09/introducing-aws-systems-manager-session-manager/ A related service, EC2 Instance Connect, was introduced in June of 2019: https://aws.amazon.com/about-aws/whats-new/2019/06/introducing-amazon-ec2-instance-connect/. It enabled IAM based ssh controls with CloudTrail auditing, as well as browser based SSH from the AWS Console for those who like web GUIs. Why they are so cool These two products are superior to our current bastion setup for a few reasons: Managing SSH keys is hard, especially in multiple production environments. IAM based access is easier, especially if you're already managing IAM access to other resources. In our current example, managing a single key is easy. But you'd need to be creative to manage multiple keys, especially if you wanted them to dynamically stay up to date with new members of your company.

Databases are.. well.. important. Auditing access to them can be critical, but auditing SSH sessions manually is a pain.

SSM enables you to access instances in private subnets (as long as they have external internet access through a NAT Gateway).

By not requiring public DNS access, attackers can only attack your instance if they have access to your AWS account. This is a huge improvement over a public IP address that they could attack with any sort of SSH-based attack. We all know you don't constantly keep up to date with updating OpenSSL on all your exposed servers ;) For the next step, we'll add support for SSH'ing to our instance over EC2 Instance Connect. Haiku Did you ever hear of the time when the villagers of a mountainous domain overthrew their government? It was called the high coup. And it went a little something like this: AWS Services out the wazzu Gotta learn 'em all

Why EC2 Instance Connect is Awesome One of the cool things about EC2 Instance Connect is that you don't use long term SSH keys that need to live on the bastion instance. Instead, you use the aws ec2-instance-connect cli tools to send temporary public keys to the instance, that you then have 60 seconds to authenticate to using the private key. After the 60 seconds are up, the public key is forgotten. This is more powerful than it may at first seem. What this really means is that: Access is now controlled by IAM policies of who can use aws ec2-instance-connect send-ssh-public-key

You don't need to manage SSH keys on the instance

AWS can automagically audit all access, just by watching when people sent ssh keys This requires a few quick updates to our terraform. Updating Terraform Code First, you can entirely remove the ssh_key_pair module, as we no longer need to manage SSH keys :) Then, add an IAM Instance Profile that will allow your bastion to make use of the EC2InstanceConnect policy: module instance_profile_role { source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role" version = "~> 2.7.0" role_name = "codelab-role" create_role = true create_instance_profile = true role_requires_mfa = false trusted_role_services = ["ec2.amazonaws.com"] custom_role_policy_arns = ["arn:aws:iam::aws:policy/EC2InstanceConnect"] } Next, update your bastion module to remove the key_pair, make use of the new instance profile, and install ec2-instance-connect , by making it look like: module "bastion" { source = "terraform-aws-modules/ec2-instance/aws" version = "2.12.0" # Ubuntu 18.04 LTS AMI ami = "ami-035966e8adab4aaad" name = "codelab-bastion" associate_public_ip_address = true instance_type = "t2.small" vpc_security_group_ids = [module.bastion_security_group.this_security_group_id] subnet_ids = module.vpc.public_subnets iam_instance_profile = module.instance_profile_role.this_iam_instance_profile_name # Install dependencies user_data = <<USER_DATA #!/bin/bash sudo apt-get update sudo apt-get -y install ec2-instance-connect USER_DATA } The very last update is to add two new outputs: output "instance_id" { value = module.bastion.id[0] } output "az" { value = module.bastion.availability_zone[0] } As per usual, run a terraform init then terraform plan -out plan . If the output looks good (note that the bastion instance should be recreated), then run a terraform apply . Hooray! You have now enabled EC2 Instance connect on the instance. To connect, we need to generate a temporary ssh key, send the public key to the instance, and then create an SSH tunnel using the private key. Establishing SSH Tunnel This is as easy as: echo -e 'y

' | ssh-keygen -t rsa -f /tmp/temp -N '' >/dev/null 2>&1 aws ec2-instance-connect send-ssh-public-key \ --instance-id `terraform output instance_id` \ --availability-zone `terraform output az` \ --instance-os-user ubuntu \ --ssh-public-key file:///tmp/temp.pub ssh -L 5432:`terraform output rds_endpoint` -Nf -i /tmp/temp ubuntu@`terraform output bastion_ip` Verifying our Test Data is still there Checking that is worked is as easy as executing a psql command to verify the data we added earlier is still there: psql -d codelab_db -p 5432 \ -h localhost \ -U codelab_user \ -c "SELECT * FROM codelab_table" To cleanup, let's close the tunnel with kill $(lsof -t -i :5432) . Haiku "Hey there," Kew said. "Hi Kew," I replied: Short lived keys are good To dust - the keys shall return But the tunnel lives

There's still room for improvement We already have a pretty darn secure and auditable system, but at Transcend we strive to minimize public DNS endpoints wherever possible to reduce our attack surface. In the current setup, our bastion still has a public IP address, which is a requirement for EC2 Instance Connect (without using the SSM trick this codelab is leading up to). If we moved our bastion to a private subnet, where it could have no public DNS, we would lose EC2 Instance Connect access. However, we could still access it with AWS Systems manager: so let's talk about it. About SSM Systems Manager comes with a bunch of premade "Documents" that you can run on sets of EC2s, similar-ish to running Ansible Playbooks on servers, if you're familiar. If you aren't familiar with Ansible or other automation tools, no worries - the idea is pretty simple: automate running the same commands on a bunch of machines at once. In this code lab, we just need to worry about the AWS-StartSSHSession Document, which lets you create SSH sessions to EC2s, with the only requirements being that the EC2: Has internet access

Has an instance profile allowing it to talk to an SSM endpoint

Has SSM installed and running on the server (which Ubuntu 18.04 LTS amis do by default) Enabling SSM Access To start, update the VPC module to have some private subnets, and to have NAT gateways that will allow those private subnets to access the internet: module "vpc" { source = "terraform-aws-modules/vpc/aws" version = "~> 2.18.0" name = "codelab-vpc" cidr = "10.0.0.0/16" azs = ["eu-west-1a", "eu-west-1b"] # For the NAT gateways public_subnets = ["10.0.101.0/24", "10.0.102.0/24"] # For the bastion host private_subnets = ["10.0.201.0/24", "10.0.202.0/24"] # For the RDS Instance database_subnets = ["10.0.1.0/24", "10.0.2.0/24"] # Ensure the private gateways can talk to the internet for SSM enable_nat_gateway = true # Allow private DNS enable_dns_hostnames = true enable_dns_support = true } Next, In the database security group, update the ingress_cidr_blocks param to be module.vpc.private_subnets_cidr_blocks (notice that it should not be in a list anymore). This is really cool, as it ensures that only private instances can access your database, a huge security win! On your bastion security group, you can entirely remove the ingress_cidr_blocks and ingress_rules lines, because we no longer need the SSH ports open. SSM works by sending out requests to the SSM endpoint, so we can entirely eliminate the ingress. Think about how cool that is, our instance will have no public DNS, and will have no security group ingress: it's like a dream! If you're like me and dream about highly restricted network access, anyways. On the bastion module, there are two changes: set associate_public_ip_address to false (or remove the line entirely, which has the same effect)

to (or remove the line entirely, which has the same effect) Change the subnet_ids input to be module.vpc.private_subnets . Lastly, we need to update the IAM permissions to allow for the bastion host to talk to the SSM service endpoints. In your instance_profile_role module, add the "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore", arn to the custom_role_policy_arns list. And we're all done with our complete terraform setup! Run terraform plan -out plan , verify the changes look good, and run terraform apply plan to finish off. Haiku Private subnets rock But you need a NAT gateway to get to the web

We're finally to the final step: tunneling through a bastion instance in a private subnet to reach an RDS instance in its own even more private subnet. SSH'ing with ProxyCommand It has been pretty common for a long time to have bastions or Jump Servers, where someone would want to SSH onto a system and then SSH from that instance to a different instance that couldn't be SSH'd to without first going to the bastion instance. To help with this, ssh has an option called ProxyCommand . There's a great blog at https://www.cyberciti.biz/faq/linux-unix-ssh-proxycommand-passing-through-one-host-gateway-server/ if you're unfamiliar and want to see example of its usage in depth, but the main idea is just that it allows you to make two ssh jumps in a single command. A command like ssh -o ProxyCommand="ssh -W %h:%p user@bastion.com" other_user@private.server.com is roughly equivalent to: ssh user@bastion.com ssh other_user@private.server.com # run from the bastion.com server How does that help us We want to use our bastion instance as a jump server in a ProxyCommand to get to our RDS instance, but our bastion does not have any public DNS that we can put as the host in our ssh command. That's where the SSM Document we talked about earlier, AWS-StartSSHSession , comes in. It uses some SSM TCP magic to create SSH sessions to any server with SSM enabled. The final command Quick prereq: Install the SSM plugin for your cli: https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html Putting this all together, we can run the commands: echo -e 'y

' | ssh-keygen -t rsa -f /tmp/temp -N '' >/dev/null 2>&1 aws ec2-instance-connect send-ssh-public-key \ --instance-id `terraform output instance_id` \ --availability-zone `terraform output az` \ --instance-os-user ubuntu \ --ssh-public-key file:///tmp/temp.pub ssh -i /tmp/temp \ -Nf -M \ -L 5432:`terraform output rds_endpoint` \ -o "UserKnownHostsFile=/dev/null" \ -o "StrictHostKeyChecking=no" \ -o ProxyCommand="aws ssm start-session --target %h --document AWS-StartSSHSession --parameters portNumber=%p --region=eu-west-1" \ ubuntu@`terraform output instance_id` To test that this worked, let's run a query: psql -d codelab_db -p 5432 \ -h localhost \ -U codelab_user \ -c "SELECT * FROM codelab_table" Yay. :) Cleanup Lets kill the tunnel: kill $(lsof -t -i :5432) and then remove all AWS resources: terraform destroy -auto-approve Gosh I love terraform :) Haiku Got to kill the port Got to destroy resources It's time to clean up