08/10/2018

WebRTC vs Zoom? WebRTC is actually quite good. But you knew that already – didn’t you? 🙂

They say quality is in the eye of the beholder. So behold.

We’ve all been told once and again that this video conferencing vendor or that video conferencing vendor work great. They offer the best quality. The best experience. They work in conditions that others don’t.

I even had a call once with an entrepreneur that explained to me how he is going to offer a service that is better in its 1:1 video quality than Skype and Google Hangouts. And he is going to do it with WebRTC. I spent the better part of that call to get him off that idea (something about his logic was off there).

But I am digressing.

As many others, I’ve been told time and again how Zoom is great. How in spite of the fact that it doesn’t work in the browser and forces you to download its client (some even refer to it as a virus), it gets traction and adoption. It feels like it is the best game in town. And then they mention the reasons:

It’s free (until it isn’t, which is a great business model if you can make it work, and Zoom is making it work) It has better video quality than the competition. Especially WebRTC

I am not the only one who needs to listen to it, and even believe it to some extent. The guys at Jitsi got curious – why not put it to the test?

So they took a Mac device, placed it on a WiFi network, added a network limiter so they can fiddle with the network configuration, and did a 1:1 call. Once with Zoom. And once with WebRTC.

Idea is this – start with as much bandwidth as the video call wants. Then limit it to 500kbps. Check how much time it takes to adapt. Remove the limit and change how much time it takes it to adapt back. More about it in Jitsi’s blog.

Essentially – testing for this network conditions:

The longer that marked areas, the worse the experience is going to be for the users.

And guess what? Zoom faired worse than WebRTC. Not a little, but a lot worse.

Full adaptation to limiting the bandwidth took WebRTC 20 seconds. It took Zoom 156 seconds (!).

Ramp up back to 2mbps took WebRTC 32 seconds. It took Zoom 62 seconds.

Now here’s my analysis of this.

WebRTC Rocks

Yap. it really does.

The screen capture from that Zoom blog post that was pasted by Jitsi?

Stating that “web-RTC is a very limited solution that would not allow us to provide all the excellent features that our users have come to expect from us”?

That’s from 2015.

A lot have been improved in WebRTC since then, if that explanation was even correct in 2015 to begin with.

Without the need for most of us to do anything, we’re getting updates to a top notch media engine in the form of WebRTC inside the browsers we use. The code used in Chrome are open sourced, so they are accessible to all to embed it in their own applications as well.

Security fixes? New codecs? Improved media algorithms? They just “happen”. Out of thin air. For most of us.

Defending Zoom

If I look at it from Zoom’s point of view, besides the fact of being a dominant player in the market with or without WebRTC, here’s the challenges with such a test scenario:

It was done once, or a few times. But it is still only one scenario

It wasn’t a real life scenario. Just something concocted for this. Jitsi could have rigged it and tweaked it so that WebRTC would shine, but in real life, that doesn’t happen, and at Zoom we’re optimizing for real life scenarios (that isn’t really so. From my experience and knowledge of the Jitsi team, I’d estimate they tried to be VERY careful here to not fall into that trap) (and what’s real life scenarios anyway?)

The network limiter used changes behavior in ways that aren’t close enough to reality (that I can understand and live with. We see faster uptake of the same type of scenarios for WebRTC at testRTC – more on that later)

Zoom might be working through external remote servers for that same session while WebRTC is going peer to peer on the local network. Servers behave differently than clients, so the results seem somewhat “off”

In other scenarios, Zoom might actually be better than WebRTC

Which leads us to the fact that more tests are needed to know which one is best and in which scenarios.

This starts to sound like the VP8 vs H.264 quality comparisons of the past (I never could tell the difference).

It’s the Infrastructure Stupid

With WebRTC, it all boils down to the infrastructure. The one with the better deployment wins the quality game.

Do you peer to peer for 1:1 sessions and seamlessly switch to SFU architecture when more participants join? Where are your media servers located? Do you cascade the session across media servers to improve quality? Do you provide feedback to the user about the network conditions? Do you switch video off when there’s not enough bandwidth? How are you managing things like FEC, simulcast, SVC, … ? What about mobile and native app support?

And the list goes on.

With vendors who use proprietary codecs and transport protocols, this is doubly so, as they need to cater for the browser once they reach WebRTC. So while their native apps might be optimized, it might all go down the drain once they transcode or just “translate” to reach the browser using WebRTC.

Need to understand WebRTC and how to design and architect real world solutions with it? A first step is to understand the servers used to connect WebRTC. Join a free video course on WebRTC servers

Which brings us to why someone like Zoom should use WebRTC and thing about the quality issues once connecting to it:

You Need WebRTC

UPDATE: This section was updated to reflect what was later found while investigating what Zoom is doing. Head over to webrtcHacks for the full story.

Zoom supports something like WebRTC. I just found out when I searched for stuff to write this article: there’s a Zoom Web Client

It runs on Chrome and enables using audio in Chrome when joining meetings. No video, probably because transcoding the proprietary video codec Zoom uses to the ones in WebRTC is too complicated, but using G.711 or Opus in the browser and transcoding or using the same in Zoom is way simpler.

It tries its best NOT to use WebRTC and still get something working on the browser, which is no easy feat.

I expect Zoom to eventually undergo through the same phases that Amazon did with Chime:

Amazon Chime started with a downloadable client

They then added limited browser support that enabled users to view the screen shared in the browser and connect via the phone without the need to download the client

Later on, audio support was added to the web client

And recently, video got supported

Screen sharing and remote desktop control still doesn’t work. I’d say it is a matter of time

This exact same path has been happening to other vendors in one way or another.

Why not Check Your Own Service?

While writing this article, it dawned on me, that this is one of these scenarios that is ridiculously easy to simulate using testRTC, so I went ahead and created a script that does just that:

Loads up Jitsi with 2 participants. That should cause them to work peer-to-peer

Run the call for 1 minute unhindered

Limit bitrate to 500kbps and run for 2 more minutes

Remove bitrate limit and run for 2 more minutes

Here’s how the main part of the script looks like:

// Wait for 1 minute client .pause(60*sec) .rtcScreenshot('ALL GOOD'); if (probeType === 1) { client .rtcEvent('Start limit', 'global') .rtcSetNetworkProfile('custom', 'bandwidth', 500000, 'both', 'both') } // 2 minutes with bandwidth limits client .pause(60*sec) .rtcScreenshot('LIMITED') .pause(60*sec); if (probeType === 1) { client .rtcSetNetworkProfile('') // back to pristine network conditions .rtcEvent('Stop limit', 'global'); } client // 2 more minutes unlimited .pause(60*sec) .rtcScreenshot('BACK TO NORMAL') .pause(60*sec);

The .rtcEvent() calls are there to place a vertical lines on the graphs while the .rtcSetNetworkProfile() is there to fiddle around with the network conditions.

There were two probes here, each one a participant in the call. The first one is the one I limited while the second one was left “untouched”.

Here’s what the graphs look like on the second probe:

The above graph shows the outgoing birate. Within a span of 5 seconds, WebRTC finds out the new effective bitrate and adapts to it. Ramping back up takes some 20 seconds.

The above graph shows the incoming frame rate. You can see how frame rate reporting in WebRTC takes a bit of time to get back to its usual self – also some 20 seconds or so.

I wanted to check how the Jitsi SFU would behave, so I tweaked the test URL for that. The results? Still better than the Zoom one. 20 seconds to hit 30 frames per second and around 50 seconds to get back to full bitrate.

If you want to try it yourself, just import the JSON file in this Google Drive folder to your testRTC account and modify it to fit your needs.

Where to now?

WebRTC is more than good enough.

Making it better is usually about thinking your way through the best possible architecture, along with media servers that take care of network conditions properly.

As for Zoom… please make sure your next call with me is on something that has WebRTC. The machine I regularly use for call is Linux. Zoom doesn’t work there… it doesn’t really support Chrome or Linux. Yet.