OpenMetrics

Here comes OpenMetrics.

It’s a huge effort to create a standard specifically designed for exposing metric data and providing tools to also go beyond metrics since observability is not just that but also logging events and correlating things.

At the moment it’s still a draft, currently sandboxed by the CNCF, but the final goal is to have an RFC.

By the way don’t ask me when please … !

Anyway don’t get scared. It is not complex, not really something you’ll need to learn from scratch. It’s just a wire protocol, a lingua franca in the whole observability story being built using the Prometheus exposition format as the starting point.

The journey so far

Everyone know Prometheus exposition format nowadays, right?

It simply displays metrics line-by-line in a text-based format, and supports the histogram, gauge, counter, and summary metric types.

And yeah, it’s true that Prometheus is really good at doing metrics, and … nothing else. 🤷‍♂️

It has had a broader adoption, really fast. Because its format is simple, easy to read, it enforces labels rather than hierarchies. But also because there was nothing out there in the field …

What did we have before Prometheus?

The awesome SNMP? 🤦‍♂ ️Then? Other proprietary formats, with missing docs, difficult to implement, with hierarchical and rigid data models, or with all of these things together?

So someone could argue it has been a easy victory! But rest assured there are no easy wins in computers. Almost not lasting ones …

The standardization

So why do we need OpenMetrics? The truth is that we don’t.

The world needs it, particularly the part of the world which is not cloud-native. I mean the traditional networking, storage, hardware vendors.

Yeah politics is part of the world: traditional vendors wants to avoid lock-in or to appear supporting external (or maybe competing) products. They usually prefer to use official standards only, and it’s perfectly fine.

So OpenMetrics is about enabling all the systems to ingest and emit data in a certain wire format, and agree on what that wire format should be. To talk to each other about themselves over HTTP. It does not intend to prescribe what you must do on the other end. Its aims is to introduce the concept of n-dimensional spaces via labels into the world. And by doing this, it will totally kill the concept of hierarchical data models, which is one of its goals.

The way to achieve this is to create an official open standard: a set of vendor neutral guidelines, with no brand, available to all to read and implement in order to foster cooperation to solve a problem we all have.

Novelties

So the Prometheus exposition format and OpenMetrics will mostly be the same except for some improvements, namely:

(probably) an official IANA port assignment

a registered content-type/mime-type

application/openmetrics-text

new descriptor directive to represent the unit of a metric (it will work for all metric types except for info and state sets) — ie., UNIT

# UNIT foo_seconds seconds

single lines will always ending with a LINE FEED (ie.,

) but metric sets will need an end marker — ie., # EOF — to help detect responses that got cut off

# HELP test_m Bla bla bla description no one really reads

# TYPE test_m counter

# UNIT potatoes 🥔

test_m{...} x

test_m{...} x

# EOF

UNIX timestamps in seconds

same escape rules for HELP directive and labels

better handling of white spaces between tokens

Also new metric types:

state sets for representing enums, bitmasks, generally booleans

# TYPE foo stateset

foo{entity="controller",foo="a"} 0

foo{entity="controller",foo="b"} 1

foo{entity="replica",foo="a"} 0

foo{entity="replica",foo="b"} 1

# EOF

info metrics consisting of metrics to monitor how info changes over labels (and/or time)

# TYPE x info

x_info{entity="ctrl",name="pretty",version="8.1"} 1

x_info{entity="repl",name="prettier",version="8.2"} 1

gauge histograms — ie., just histograms for gauges

— ie., just histograms for gauges unknown — ie., what Prometheus calls untyped metric

Then exemplars, at most one per histogram’s (or gauge histogram’s) bucket.

They are a mechanism which allows you to attach an ID off a trace to directly link to that trace.

For example imagine that you know your latency in a bucket is more than 60 seconds and you want to know why this is happening. Now you have exactly this trace over to that since you have that link to that other trace ID.

foo_bucket{le="0.1"} 8 # {} 0.054

foo_bucket{le="1"} 10 # {id="9856e"} 0.67

foo_bucket{le="10"} 17 # {id="12fa8"} 9.8 1520879607.789

Clearly not every monitoring solution will support exemplars. Infact they are designed to be optional.

For example the plan of Prometheus at the moment is to ignore them. Instead OpenCensus (and others soon) will support them

State of the Art

Currently you can emit OpenMetrics with a Prometheus python client 2.5 or greater, since it implements an experimental first reference parser of the existing draft for the text format.

Very important companies (such as Google and Uber) are supporting it and will soon implement other reference parsers. Namely Google, and Uber.

OpenCensus, which is going to merge with OpenTracing, will incorporate it.

If you want to start implementing it, or porting your hand-made endpoints, there is out there a one-liner, using the aforementioned Prometheus python client, to verify they conform to the current standard.

import sys

from prometheus_client.openmetrics import parser s = """test_packets{key="a",node="b"} 8

# EOF

"""

list(parser.text_string_to_metric_families(s))

Nothing more yet.

And, since it has been in the making for almost 2 years now, this is an issue!

We all know that simple looking things are usually difficult to do well.

What is missing is good constant communication, and maybe tools like a test suite to develop against, to let the community help definitely shaping it.

And give finally birth to it!