AWS launched Serverless Lens for the Well-Architected tool in the Console, including topics such as operational excellence, reliability, security, performance, and cost. This is fantastic news for all engineers that want to gain more knowledge and experience with the best practices of serverless architectures.

A key component of operational excellence is tracing, or more accurately, distributed traces. As our applications become loosely coupled and composed with more services, resources, and APIs – distributed traces help us to understand the intercommunication. To be able to pinpoint traces efficiently we can use tagging. When it’s done right, tags can become super helpful when trying to slice and dice an event among tons of information, or for data aggregation.

In this post, we are going to learn and see some examples of tagging in traces, and we will demonstrate how to accomplish that using Epsagon. Application health using distributed traces Following the new well-architected for serverless, let’s examine OPS 1 question: How do you evaluate your Serverless application health? The answer starts with tracing.

Let’s take the following retail application: This application is responsible for our whole retail needs – stock management, payment, catalog and more. Good tracing can help us visualize such a draw, into a real service map with metrics. Service maps allow us to understand connections in distributed applications, and detect performance issues and bottlenecks. Also, for troubleshooting, it is easier to see an end-to-end trace, including the payload and the relevant logs, instead of trying to correlate logs between different services. An example from Epsagon can clarify how a root cause analysis can be done easily with a visual trace: Tagging traces While tracing, especially when automated, is a powerful tool, sometimes we need to pinpoint a specific event in our application, or detect trends based on a unique-business dimension. Tagging adds more context to an existing trace in the form of key=value. For example, we can add the following tag: `userId=123`. In this scenario, we will be able to filter all traces that matched a specific user in our application. These are some good tags that can be used: Identifiers – can help us to pinpoint an event based on our application unique identifiers. For example user ID, customer ID, item ID, etc.

Flow control – can help us understand what happened in the code. For example, event type, item category, etc.

Business metrics – can help us to understand some unique business KPIs. For example, the quantity of items in a purchase, views of an item, etc. Tags help us in the following scenarios: Correlate incidents and customers – Looking for a problem that happened to a specific user/customer in our application.

Insights into the customer experience – Understanding the performance metrics of a specific event in our system.

Business trends – Looking at trends for business KPIs. Tagging traces with Epsagon Let’s use the previous scenarios on our blog site application. Using Epsagon it is pretty straightforward to add tags to a current trace in Lambda functions: import epsagon def handler(event, context): epsagon.label('userId', event['headers']['user']) epsagon.label('eventType', event['body']['event']) ...