Overview

In this post we're going to cover two things:

Setting up unified CloudWatch logging in conjunction with AWS ECS and our Docker containers.

Setting up SSM and Enabling the ability to run one bash command across multiple EC2 container instances simultaneously

What Logs?

Ease of administration for our infrastructure is paramount... It's not the "hottest" topic, but when traffic is burning our servers down and customers are complaining about errors - logs are your best friend.

Also, Docker bragging about unified CloudWatch logs in their "Docker for AWS" service struck me as odd. And then I thought about it - AWS doesn't make it exactly apparent on how to set all of this up. Yes it's there, but as with everything AWS Docs, you wind up fishing around for hours on in combined with lots of tinkering.

What logs? Two sets. First, those from the containers instances (servers) themselves:

/var/log/dmseg : the most recent kernel messages

: the most recent kernel messages /var/log/messages : more or less all logs, with timestamps

: more or less all logs, with timestamps /var/log/docker : logs directly from docker

: logs directly from docker /var/log/ecs/ecs-init.log : logs from the initialization of ECS

: logs from the initialization of ECS /var/log/ecs/ecs-agent.log : logs from the agent that's communicating between ECS and this instance

The Second set are logs straight from your task containers. For example, if I have a Node application running in my task, the logs from that application will be sent to CloudWatch as well.

After we set them up all of them will be visible in:

AWS Console > CloudWatch > Logs .

Run Command to Rule Them All?

AWS has a fun sysadmin tool called Run Command. It allows us to..

a) select a set of EC2 instances

b) run one command

c) have that command hit all of the instances

d) return the output to us

Optionally we can, from that output, pipe it to an S3 bucket for future consumption or even send it to SNS, so that we can receive it via email, text, etc.

It should be pretty apparent how being able to do something on all of our Server's simultaneously is a HUGE win for Sysadmin.

Keeping Things DRY

I've already written a guide on setting up a fully load balanced and autoscaled ECS environment. Therefore I'm not going to walk step by step by step of how to setup an ECS cluster. Instead I'll (a) reference points in this guide and (b) give a good explanation of how to apply it to any process.

Also you can find the script and policy we'll use here:

https://github.com/jcolemorrison/ecs-logging-scripts

Getting IAM Right

In order to use logs we'll need to begin with IAM. When setting up ECS, our EC2 Instances that get launched into our cluster, aka Container Instances, require an IAM role. This role allows them to communicate with the ECS service endpoint and message/be managed by the cluster they've joined.

Just to get them up and running only 1 policy is required. The:

AmazonEC2ContainerServiceforEC2Role

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ecs:CreateCluster", "ecs:DeregisterContainerInstance", "ecs:DiscoverPollEndpoint", "ecs:Poll", "ecs:RegisterContainerInstance", "ecs:StartTelemetrySession", "ecs:Submit*", "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": "*" } ] }

In the Guide, we created this policy and add this role around Step 20.

If you're setting up IAM Roles from Scratch, you'd create a new Role in IAM, search for the policy AmazonEC2ContainerServiceforEC2Role and attach it.

However, to allow for full Container Instance to CloudWatch functionality, we need one more policy attached.

1) In IAM, Select Policies

If you get a splash screen, click through it.

2) Click Create Policy

3) Click Create Your Own Policy

4) Name it EcsInstanceLogsRole

5) For Description put Role for Creating and Putting Logs to CloudWatch

6) Insert the following:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", "logs:DescribeLogStreams" ], "Resource": [ "arn:aws:logs:*:*:*" ] } ] }

Even though this overlaps, with the "logs:CreateLogStream" , "logs:PutLogEvents" , on our ECS role, that's fine. They won't collide and we shouldn't modify the ECS role directly. The other reason this works out is because if we need to allow CloudWatch logs on for other services that aren't ECS, we're ready to go.

It should look like:

7) Click Create Policy

Now this policy will be available and search-able for any future IAM roles in any region or account.

In Guide after Step 25, we would attach create and attach this policy then.

If you're setting up your own ECS Instance Role, you would:

a) Create a Role, name it something like EcsInstanceRole

b) Attach to it the AmazonEC2ContainerServiceforEC2Role

c) Create the above EcsInstanceLogsRole

d) Attach EcsInstanceLogsRole to the EcsInstanceRole

8) Add the Simple Systems Manager Role

Much like how we added the EcsInstanceLogsRole , in order to use the Run Command ... command on our Ecs Instances, we need to add one more policy.

If you've followed the Guide, you've actually already added it at Step 21. Oddly enough, I put it in as a mandatory step out of pure habit.

If you're creating your own Role you would do the following assuming you've created an IAM Role called EcsInstanceRole that includes both the AmazonEC2ContainerServiceforEC2Role Policy and the EcsInstanceLogsRole Policy we created.

a) Select that role ( EcsInstanceRole )

b) Under the Permissions Tab, click Attach Policy

c) In the filter input, search for AmazonEC2RoleforSSM and attach it.

Simple as pie. Now you have a the IAM role needed for. Your new role should look like:

Now let's talk about setting up the servers to actually send these logs and receive commands.

Creating an Empty Cluster

I'm including this simply because, if you've created a cluster from AWS's first time wizard OR created it using anything other than an empty cluster.......... AWS will not respect changes to your scaling groups or launch configurations easily. Instead, when you choose to scale up through ECS, it will fall back and use the settings created when spinning up your cluster for the first time.

To do so:

a) Head over to ECS (aka EC2 Container Service as it's called in the AWS Console)

b) Click on Clusters

c) Click Create Cluster

d) Check Create an Empty Cluster

e) Give your cluster a name. I'll call mine ECSLogsCluster

f) Remember the Cluster Name

In the guide this takes place during Step 17

Setting up / Modifying a Launch Configuration for Logging

There are three ways to launch instances into an ECS Cluster:

1) Manually.

2) Manually Automated - meaning you script it with something like Chef, Terraform, even CloudFormation

3) Using AutoScaling

I'm not really sure why anyone would do B. Even if doing CloudFormation, it's still best to use AutoScaling groups. A may be to test new server configurations. C, should be the way to go.

ANYWAY. Here's what we want/need to achieve when launching EC2 instances, on boot, in context of an ECS Cluster:

a) Tell it what cluster to join

b) Install the awslogs tool

c) Setup the default points to send our Logs to

d) Set the region to send our logs to (which by default will be the one our instance resides in)

e) Setup logging meta data so that we can tell which logs belong to which clusters, services and instances

f) Install the SSM Agent to allow for Run Command

To achieve all of these tasks, we'll do so using User Data. When launching an EC2 instance or creating a Launch Configuration, we can provide a script to be run at launch. In our cases we'll use this one:

https://github.com/jcolemorrison/ecs-admin-helpers/blob/master/ecs-logs-init.sh

This just does a through f from above. It's pieced together from lots of usage and some snippets here and there from the AWS Docs. Be sure to replace the YOURCLUSTERNAMEHERE with your cluster name.

So, in the Guide, this process of setting up a Launch Configuration and AutoScaling group to put instances into our cluster occurs here. During Step 37 instead of using the user data that I specified in that step, you'd instead just upload the file like so:

or copy and paste it in.

Note: you must assign all container instances a public IP address because outside access is needed to communicate with ECS. If they're in a private subnet, and cannot have a public IP, than you must route them through a NAT Instance of NAT Gateway.

If you haven't gone through the guide, the quick run down of this is to:

1) Create a Launch Configuration

2) For AMI Usage, In the Community AMIs, find the AWS ECS Optimized Image. I know it's in Community AMIS, but it's AWS's constantly kept-up-to-date AMI. This is where the list of the most recent one can always be found:

http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html

3) Select whatever you'd like for instance type

4) In the Configure Details screen, make sure to use the IAM role we created EcsInstanceRole and upload the User Data file like so:

5) Continue on, setup desired storage, tags, and finalizing.

After this, you'd create an AutoScaling group from this launch configuration and tell it how many instances to keep up and to scale up to. Again, you can see an in-depth walk through in the guide mentioned above.

But I already have a launch configuration, or set of servers, what do I do now?

Good question:

The absolute simplest thing to do, is to

1) Create a new launch configuration

2) Give it the new IAM role and User Data Script

3) In your current AutoScaling group, go down to the edit panel, here:

4) Change the launch configuration to the new on you created

5) Kill off your current instances one by one (or all at the same time if you so desire)

6) Assuming you have the scaling group set to keep to a particular desired level, it will spin back up that number of instances with the new launch configuration and thus IAM Role and User Data script.

IF you're automating the launch of your scripts with, another script, just make sure any new ones get the new IAM role and the mentioned User Data Script.

If you're launching in instances manually.. Stop, and use a launch configuration.

Just to note here, the logs:

/var/log/dmseg

/var/log/messages

/var/log/docker

/var/log/ecs/ecs-init.log

/var/log/ecs/ecs-agent.log

Will all be available in CloudWatch once the instances have gone up. We still want to setup the Task Container logs though.

Reading Logs from Tasks (Containers)

This is a one-two punch process:

1) Go to CloudWatch > Logs and then Actions > Create Log Group . Name your log group whatever you'd like

2) When creating our Task Definitions, defining the logs in the Task Definition parameters.

If that's for a JSON version, your params would look like:

{ "containerDefinitions": [ { //... settings in your container "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "ExampleTaskLogs", // <-- name of log group "awslogs-region": "us-east-1", // <-- region "awslogs-stream-prefix": "ExampleService" // <-- whatever is a nice prefix for your log groups } }, } ] // rest of your taskDefinition }

Or if it's in the console, when updating or adding a container definition:

And assuming we've setup our IAM roles and CREATED THE LOG GROUP IN CLOUDWATCH FIRST, it will start piping logs to the console immediately the tasks from this new task definition are up and running.

And Once All The Logs are Setup:

We'll be able to go to

CloudWatch > Logs and see something like:

Which will have all of our logs from all of our containers in one place. They're not all mashed together either. If you drill down into say /var/log/dmesg , you'll a separate log stream for each instance.

Additionally, the ExampleTaskLogs in the above screenshot are the logs for the containers created from the task definition we specified to use logs on. Drilling down into that will show a separate log stream for each individual task:

and then drilling down into them one more time:

will show the logs from my sample Loopback application.

Using the Run Command:

So that's all well and good, how about using that one command to hit all the servers function? Assuming you've created a launch configuration or multiple servers using the script I provided:

1) Head over to the EC2 console

2) On the sidebar click on Run Command

3) In here click on Run Command

4) Select AWS-RunShellScript

5) Click Select Instances

6) Click the instances you'd like to run the command in:

7) Type in the shell command you'd like to run. In our case let's do:

service --status-all

8) Leave everything else as is and click Run.

This will run the command, just click view results and it keeps a log of the different commands you've run in the past. Select the command you'd like to see the results of:

Click over to the output tab and click view output

Here we'll see the return of our command!

Great. Need to throw a quick update to all current instances but can't take them all down? Do this. Want to pipe logs via a curl? Do this. Want to grab everything from a particular service instantly? Do this. It's amazing.

Summary

The basic TL;DR of how to set up monitoring comes down to this:

1) Create an IAM role for your instances that includes the policies:

AmazonEC2ContainerServiceforEC2Role

AmazonEC2RoleforSSM

and the custom policy here:

https://github.com/jcolemorrison/ecs-logging-scripts/blob/master/ecs-instance-logs-role.json

2) Make sure you're using an empty ECS cluster, not one created from AWS's wizard or auto setup

3) Use the following script as your UserData launch script for the EC2 Instances:

https://github.com/jcolemorrison/ecs-logging-scripts/blob/master/ecs-logs-init.sh

1-3 are enough for logs from the container instances themselves and to run Run Command . But not for logs directly from Containers.

4) Setup a log group in CloudWatch for your Task Definitions that'd you like to capture logs from.

5) In your Task Definitions use the equivalent of:

{ "containerDefinitions": [ { //... settings in your container "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "ExampleTaskLogs", // <-- name of log group "awslogs-region": "us-east-1", // <-- region "awslogs-stream-prefix": "ExampleService" // <-- whatever is a nice prefix for your log groups } }, } ] // rest of your taskDefinition }

either in the console directly or by JSON template.

That will allow you to see all logs in CloudWatch, and run shell script across all of your instances simultaneously!

6) Checkout The Full Guide Here