Gain greater insight into system behavior in aggregate, across multiple dimensions.

A “metric” is a measurement, or value, representing the operational state of your system at a given time. For example, the amount of free memory on a web host, or the number of users logged into your site at a given time. Factor in hundreds (or thousands) of hosts, availability zones, hardware batches, or service endpoints, and you’re suddenly dealing with a significant logistical challenge.

What’s the difference in average free memory between API hosts in my Ashburn and San Jose data centers? How about the difference in CPU utilization for the EC2 clusters using 4 r5.xlarge instances instead of 8 r5.larges? How do you stay on top of it all?

“As we grow our number of metrics and/or want to make more sense out of them, we need to be more systematic.” – Metrics 2.0 (metrics20.org)

For time-series data, it’s all about metadata — which may sound familiar if you take pictures with your smartphone. Metadata isn’t the picture you take, but the time of day, location, camera type, settings, and so on — all to help you organize albums and find/organize your photos more easily. The same principle applies to how you manage your operational time-series telemetry, as this additional information gives them more depth. Known in the software industry as metric tags or dimensions, this metadata provides greater insight into your metrics.

Why are metric tags important?

Traditionally in a large enterprise, automated monitoring solutions are a one-time, “set it and forget it” situation based on the defined, up-front monitoring objectives. These solutions are then turned over to the operations team. With static infrastructure, this works, and these checks and metrics tend to remain unchanged for extended periods of time. However, static infrastructure isn’t as common as it used to be – and that’s the problem.

Today’s dynamic, ever-changing infrastructure (including web and application servers) make it difficult and impractical to apply the same static analysis and operations in monitoring them.

The world of IT has always been a moving target. Enterprise data centers are continuously subject to transformation as new software, hardware and process components are deployed and updated. Today’s dynamic infrastructure is often a fusion of bare metal, virtualized instances, orchestrated container base applications, serverless, and cloud deployed services. According to the IEEE, this puts an immense burden on monitoring needs. Not only are there thousands of different parameters to monitor, but the addition and modification of service level objectives (SLOs) may happen continuously.

Stream Tags from Circonus Can Help

Circonus’ metric tag implementation is known as Stream Tags. Our implementation improves infrastructure monitoring by adding metric metadata, to help you label and organize your metrics data. Stream Tags provide all of the capabilities of the Metrics 2.0 specification; self describing, orthogonal tags for every dimension. In addition, Stream Tags offer extra capabilities as detailed below.

CPU metric using Stream Tags to specify host, role, state, and zone

What Makes Stream Tags Better?

Stream Tags offer the ability to base64 encode tag names and values. The ability to support multi-byte character sets, as well as the full list of ASCII symbols, offers a notable advantage for non-english speakers.

There are, of course, some fun things you can do with this too. We aren’t saying that you should embed emoji within Stream Tags, but should you so desire, that’s certainly possible.

user_order|ST[drink: ,food: ]

becomes

user_order|ST[b"ZHJpbms=":b"8J+Nug==",b"Zm9vZA==":b"8J+Mrw=="]

Tag, You’re It

Let’s jump in and take a look at what stream tagged metrics actually look like. Above, we talked about metric tags which represent the amount of free memory on a host. We’ll call this metric “`meminfo `MemFree”. This is a metric that you’ll find associated with any physical or virtual host, but whose average, across many hosts, would produce a broad and un-useful aggregate. Adding a Stream Tag such as “environment” allows us to examine a segment of these metrics, such as “environment:development”, or “environment:production”. Adding the environment tag to this metric would give us something like

`meminfo `MemFree|ST[environment:production]

Now, we can use an aggregate such as the average free memory of our production environment to get an idea of how that differs from other environments. Stream Tag names and values are separated by colons, and enclosed within square brackets. The tag name and value section is prefixed by “ST” and separated from the metric name with “|” character.

Let’s take it a bit further. Say I want to know if our new release is going to blow out our cluster because we forgot to call free() in one important spot in the code. We can add a tag “deploy_pool” to the metric, which will now look like

`meminfo `MemFree|ST[environment:production,deploy_pool:blue]

Now we can compare memory usage between blue and green deployment pools using Stream Tags, and determine if the new release has a memory leak that needs to be addressed. This technique can also be used for traditional A/B testing, or canary based feature deployment.

Further segmenting is possible as needed, we could add kernel version, instance type, etc. as Stream Tags to delineate metadata boundaries.

Stream tag showing web server endpoint latency metadata

With Stream Tags, we are also able to apply as many labels and tags as we need to properly organize our metrics. Bear in mind, each unique combination of Stream Tags represents a new time series, which can dramatically increase the amount of data stored. So, it’s important to be careful when using Stream Tags to store dimensions with high cardinality, such as user IDs, email addresses, or other unbounded sets of values.

Do I Really Need Stream Tags?

If your infrastructure is dynamic, which has become increasingly common: yes.

Stream Tags allow the flexibility needed by today’s infrastructure monitoring. They make it possible to filter, aggregate and organize metrics in multiple dimensions. Analysis and operations can be applied to a set of metrics rather than individual metrics, allowing for service level monitoring in addition to resource monitoring.

Implemented correctly, Circonus’ Stream Tags are critical to monitoring modern infrastructure, because they give admins the power to aggregate metrics at every level.

When possible and applicable, automatic tagging means the metrics collection agent may be able to detect metadata from the source – e.g. role: tag from Chef, availability-zone: and instance-type: from AWS, labels from Google Cloud, etc. The metadata can then be automatically added as Stream Tags.

How fancy can I make them?

You can use Stream Tags to organize metrics with a set of key characteristics such as role, zone, department, etc. At Circonus, we append those key characteristics to the metric name following this format:

<metric name>|ST[<tag name>:<tag value>, … ]

where

<tag name> may consist of any number of characters in [`+A-Za-z0-9!@#\$%^&"‘\/\?\._-]

<tag value> may consist of any number of characters in [`+A-Za-z0-9!@#\$%^&"‘\/\?\._-:=]

EXAMPLE:

CPU|ST[mode:wait_io,host:wnode15,role:webserver,zone:us-east1b]

The following example shows a CPU metric for wait_io from a webserver with hostname node15 , in the us-east1b availability zone.

<tag-name>, <tag-value> may consist of base64 characters enclosed in double quotes and prefixed with ‘b’

EXAMPLE:

Base64encodedMetric|ST[b"X19uYW1lX18=":b"cHJvbWV0aGV1c19yZW1vdGVfc3RvcmFnZV9zdWNjZWVkZWRfc2FtcGxl"]

Please note that there must not be any whitespace between tag delimiters.

Search across Stream Tags easily with the new Metrics Explorer

You’re also able to search for matching Stream Tags to filter and aggregate metrics, display them in graphs, set rules for alerts based on the search results, and more. This search can be performed by matching the

<tag-name>

and

<tag-value>

strings or base64 encoded strings.

EXAMPLE:

(node:wnode15) // string match (b"X19uYW1lX18=") // base64 string match (b/X.+/) // base64 <regex> match

So What’s the Take-Away?

Stream Tags enable you to manage your dynamic infrastructure, while providing flexibility for future paradigms you’ll need to adopt. You can keep up with the ever-changing monitoring landscape with better transparency, more effective searches and comparisons, faster responses, and automatic metric aggregation. Stream Tags provide all the control you need to manage diverse, high cardinality sets for your operational metrics. They give you the power of the Metrics 2.0 specification, plus the forward looking capabilities provided by base64 capable tag names and values.

That’s what’s in it for you. Overall, this rich functionality will save you time and effort, enable agility to keep up with changing technology, and deliver better results than the conventional alternative. Stream Tags are expected to land on November 19th. We look forward to hearing how this feature helps each of your respective organizations.