Let me tell you a short story about how we gathered WebRTC stats for Rumpus in just a day. By the way, if you haven’t yet, try Rumpus out it’s probably the most fun collaboration tool out there.

A colleague and I were planning how to add monitoring to Rumpus backend, we talked briefly about using the USE or RED methods. At some point we realized we first wanted to know how Rumpus clients were behaving, what bitrate they were sending/receiving streams, whether they had packet lost or not, etc. So, instead of planning we just decided to go for it.

Rumpus desktop app is built with Electron which means we could get access to RTCPeerConnection stats very easily using getStats(). On the backend we have a gRPC API server and we also have Prometheus and Grafana already setup. So, the only thing we had to do was to get stats from the RTCPeerConnections from the client, define a new gRPC method to receive the stats, send the stats to Prometheus and add some nice dashboards to Grafana.

Defining the new gRPC API method

The first thing we did was defining the new method and types to add to the API. We thought gRPC client streaming would be ideal for this use case. The client would open a stream at the beginning of a meeting and would keep sending stats until the meeting ends. This is how the gRPC protobuf spec looked like:

As you can see there’s a new client streaming RPC called SendWebRTCStats that receives a stream of WebRTCStats. One thing to note about WebRTCStats is the oneof fields. This means that the client is only sending one stats type every time instead of all the stats at once. This is because it was just easier for us to do it like this in the client.

UPDATE (08/19/2019): Note that the WebRTCStats described in the protobuf above are not complete. Many more stats are available and are in constant flow, so it all depends on your browser version.

Sending WebRTC stats

On the Rumpus client we needed to use gRPC client-side streaming to send stats, for example, every other second. One small road block was to figure out how to do that from JavaScript (actually, we use TypeScript). The following code will give you an idea about how we did it:

The next thing was to simply get the stats from the RTCPeerConnection every second and send them to the backend with the code we just created above:

Receiving WebRTC stats

Our backend is mostly written in Go, so we just needed to add the new SendWebRTCStats implementation. In Go this is pretty straight forward, it looked something like this:

After receiving the data we just needed to send it to Prometheus. Probably, the most challenging thing here was to decide what data to send and how to send it. Since all WebRTC stats values can go up and down (even though some just go up like packets sent/received) we used a GaugeVec with some labels (e.g. meeting ID, SSRC, display name, etc.).

Creating the dashboards

Now that the data was in Prometheus we just needed to add a new Grafana dashboard for our WebRTC stats and add a bunch of queries for the data we were interested in.

For example, our Grafana query to get audio outbound bitrate looked something like this:

sum(rate(webrtc_stats{job="$job",

meetingID=~[[meetingID]]",

metric=~"bytes_sent",

media_type=~"audio",

direction=~"outbound"}[1m]) * 8 )

by (ssrc, display_name)

You can see how we use webrtc_stats which is the name we specified for our GaugeVec and how we use the labels to select the values we are interested in. That is, media_type is fixed to “audio”, direction to “outbound” and metric to “bytes_sent” since we are interested in audio outbound bitrate in this case. Note how all these correlate to the labels we defined in our GaugeVec. The first picture in the article shows how the new WebRTC Stats dashboard looked like.

Last words

This was a lot of fun to hack! We went from discussing to implementing Rumpus WebRTC client stats in just a day. Some may ask why we didn’t use any existing service like callstats.io. One reason is because we could just do it very quickly as we had all the tools in place (gRPC server, Prometheus and Grafana). Another reason is because the data we have is good enough for our use case, we can tell if clients have network issues during the meeting, the bitrate clients are sending and receiving, etc. Also, since all our data is in one place it will be easy to aggregate data and create new dashboards in the future. At the end, it will all depend on how things go, we might probably need a whole team devoted to this or simply use some existing service. In the meantime we will be enjoying our new WebRTC Stats dashboard!