Yesterday CoreOS released metering Operator as part of its Operator Framework. The goal of this metering Operator is to provide ability for Kubernetes administrators to find out different kinds of usage metrics for their clusters focusing on how the cluster usage correlates with the cloud resource usage.

Here is our initial analysis after going through the operator-metering code. (We have not tried the Operator yet — we will do experimental evaluation soon).

What is Metering?

Metering means collecting metrics data to answer usage questions such as — how much memory is being consumed by all Pods in a particular Namespace? Or what is the average CPU usage by a particular Pod within some time window? One of the standard ways to collect metering data in Kubernetes is through Prometheus.

What is the metering Operator?

The metering Operator is a Kubernetes custom controller that supports collecting metrics from different data sources and creating usage reports by running custom queries on the collected metric data. Currently it supports Prometheus and AWS billing reports as the data sources. Its operation is defined by six custom resources: ReportPrometheusQuery, ReportDataSource, ReportGenerationQuery, ScheduledReport, Report, and StorageLocation. There is good documentation that explains the purpose of each of these custom resources. However, we felt that it does not provide a clear picture of how the various custom resources are related. So we created following diagram that shows the relationships between them. Use this picture in conjunction with the github documentation to understand how various custom resources are handled by the metering Operator.

Observations and Analysis:

1) How is this different from directly querying Prometheus or using Kubernetes’s metrics server?

The main difference we think is the ability to collect metrics and generate advanced usage reports without doing any out-of-band automation. Without the metering Operator, you would have to write custom logic for periodically retrieving Prometheus data, storing it in some database, and then periodically running queries on it to generate required reports. All of this is avoided by using the metering Operator. The metering Operator stores data in the Kubernetes cluster itself using Presto+HDFS, the ReportGenerationQuery custom resource supports defining custom SQL queries, and the generated report by Report/ScheduledReport custom resources is also stored on the cluster (this can be changed to use S3 for storage as well).

2) The Operator seems to have been developed using first principles of designing a Kubernetes custom controller. It is not written using Operator SDK — probably Operator SDK was also being built when development was happening on the Metering Operator. It will be a good evaluation of the Operator SDK to see if it can handle creating something like this metering Operator.

3) It was not clear how to integrate metering Operator with our own Operators. For example, we have a Postgres Operator that works with Postgres custom resource. It generates Deployments, Service, Pod objects. It is not clear how to use metering for the Pods that are generated by this Operator. Our guess is that probably the Pods may need to be labeled in certain way so that ReportPrometheusQuery can be written to run query on such Pods.

4) One suggestion we have is to include a end-to-end example that shows constructing a report that correlates AWS billing with metrics that are collected from Prometheus.

Conclusion:

The metering Operator seems interesting. It will be especially useful to those who want to use Kubernetes as the single touchpoint for driving advanced metric gathering and report generation related to their cluster’s resource usage.

www.cloudark.io