Monitoring is critical for a secure, high-performing, resilient, and efficient cloud infrastructure. This blog post summarizes all the bits and pieces you need to think of when monitoring your AWS account.

Overview

The following mind map provides an overview of monitoring goals as well as all services and features related to monitoring:

Amazon CloudWatch Events : subscribe to notifications from your AWS infrastructure.

: subscribe to notifications from your AWS infrastructure. Service Specific Events : subscribe to notifications from specific services (partially legacy).

: subscribe to notifications from specific services (partially legacy). Amazon CloudWatch Metrics : get insights into the utilization and condition of your resources.

: get insights into the utilization and condition of your resources. Amazon CloudWatch Alarms : define alarms based on metrics to get notified about incidents or resolve problems automatically.

: define alarms based on metrics to get notified about incidents or resolve problems automatically. Logging : search and analyze logs from AWS services or your applications.

: search and analyze logs from AWS services or your applications. Amazon Simple Notification Service (SNS) : deliver notifications from your AWS infrastructure to responsible persons or teams.

: deliver notifications from your AWS infrastructure to responsible persons or teams. Dashboards: get an overview of your system’s status.

Note that I will not cover any 3rd party monitoring solutions in this blog post.

And you thought monitoring your AWS infrastructure is easy?

Amazon CloudWatch Events

Each CloudWatch event indicates an operational change in your AWS account. More and more services are publishing CloudWatch events as shown in the following excerpt of the mind map. A rule defines a filter and routes events to a target. Supported targets are Amazon SNS, AWS Lambda, Amazon SQS, and many more.

A few examples of how we use CloudWatch events for monitoring.

When AWS needs to reboot virtual machines an AWS Health Event is published to announce the reboot in advance.

is published to announce the reboot in advance. When someone tries to log into the AWS Management Console as a root user a AWS Console Sign In via CloudTrail event is published.

event is published. When a deployment step fails CodePipline creates an event.

Next, we will have a look at service specific events.

Service Specific Events

In addition to CloudWatch events, some services publish service specific events. Typically, service-specific events are published to Amazon SNS or sent via email. The following figure shows an overview of service-specific events.

A few examples of how we use service-specific events for monitoring.

We use AWS Budget Notifications to monitor current and forecasted costs of an AWS account.

to monitor current and forecasted costs of an AWS account. We subscribe to Amazon RDS events to get notified about maintenance windows or issues with our database instances.

to get notified about maintenance windows or issues with our database instances. We configure AWS Trusted Advisor to send security, performance, cost savings and reliability advice via email.

The next step is to use CloudWatch metrics to get insights into the utilization and condition of your resources.

Amazon CloudWatch Metrics

Almost every AWS resource sends metrics to CloudWatch allowing us to look into the black boxes also known as EC2, S3, RDS, and so on.

Discussing every metric available through CloudWatch is out of scope for this blog post. Have a look at Amazon CloudWatch Metrics and Dimensions Reference if you want to dive into the details.

We usually have an eye for the following metrics.