Mitigate cross-origin resource sharing policies.

For those not familiar, CORS is a security policy which strives to prevent code from directly acting upon an external website’s information.

Consider the following scenario:

Two websites exist: “Site A” and “Site B.”

“Site B” can show sensitive information inside of itself and has an interest in protecting said data.

“Site A” embeds an <iframe> whose URL points to “Site B.” In essence, “Site A” is hosting “Site B” within itself.

In theory, “Site A” should be able to programmatically access the information within “Site B” since “Site B” is a subset of “Site A” due to the embedded <iframe>. In practice, thanks to cross-origin restrictions, “Site A” is denied the ability to read the data.

This is all well and good for security, but, for our task at hand, it is an obstacle to overcome. As the name would imply, YouTube’s IFrame API provides an API to an embedded YouTube player which is housed inside of an <iframe>. This player is guarded by CORS policies. We are unable to directly act upon it.

One way of overcoming CORS is through asynchronous message passing. An external website, such as “Site A,” can call window.postMessage.

The window.postMessage method safely enables cross-origin communication.

“Site A” is able to request that “Site B” perform an action. If “Site B” is listening and wishes to honor the request then it may respond accordingly.

Unfortunately, YouTube’s <iframe> doesn’t give a damn about us. We need to encourage it a bit.

Enter Chrome extension content scripts.

Content scripts are able to extend the functionality of any web page. The only caveat is that the user must grant permission to do so.

Let’s look at some code. This is manifest.json. Manifest files are used within Chrome extensions to declare software needs prior to installation.

Declaring our intent to inject arbitrary JavaScript into YouTube pages provides us with a mechanism for interacting upon their data more closely. We’ll still need to communicate through window.postMessage, but at least we’re talking!

Intercept communications with video servers

It’s great that we have a way of communicating with YouTube’s embedded website, but it’s not much use unless we have something to chat with it about!

How are we going to get the data we’re interested in?

YouTube’s API is a staggering ~50,000 lines of code. Yeah. Fifty thousand lines. Minified. Are we really going to try and read, digest, and modify their source? That would be crazy. Only someone really, really stupid would try to do that…

…Your humble author found it to be an interesting experience. I wouldn’t recommend it to anyone who values their time, but I did end up learning an immense amount regarding the inner workings of YouTube’s IFrame API.

Eventually, it dawned on me that there was a much simpler solution: override YouTube’s usage of XMLHttpRequest and provide our own, custom implementation.

First and foremost, we’re going to need permission to make this happen:

Web accessible resources will allow us to load arbitrary content from within our content script. Why is that useful? The injected code will be run from within a different sandbox policy than our content script.

Content scripts are sandboxed such that they have access to Chrome extension APIs, but are prevented from accessing variables scoped to their parent window. Conversely, web accessible resources do not have access to Chrome APIs, but are able to work more closely with their parent window.

Now, inject interceptor.js into YouTube’s iframe via youTubeIFrameInject.js:

Voila! We’ve magically given ourselves the ability to listen in on all XMLHttpRequest instances spawned by YouTube’s IFrame API.

It’s pretty much magic.

Capture necessary video information

This is where things start to get a bit more technical. We’re going to need to do a few things in order to capture the data we’re interested in:

Parse responses from YouTube’s video server.

Find codec information inside the appropriate response.

Find video buffer data as it is passed to us in chunks.

Make video buffer data accessible from outside the <iframe>

Here’s the code:

Holy moly! That’s some dense code. Fear not! I’ll break it down so that we can better understand what each piece contributes to the whole.

We’re interested in responses from YouTube’s server, not requests, but the easiest way to listen for a response is to setup an event handler beforehand. So, we start listening for the current request to finish loading.

YouTube provides a plethora of codecs based on the quality, size, and encoding of a given video. Simply hard-coding a codec will result in a lot of black screens. We know that YouTube has coded this already. So, digging through their source code proves not only warranted, but fruitful.

We’re able to find and leverage YouTube’s algorithm for parsing their server’s responses. It has some questionable edge cases, but, if it’s good enough for YouTube then it’s good enough for us. Let’s go ahead and store our found codec information in a lookup table for future requests.

Additional responses should hopefully contain chunks of video buffer data. This is easy enough to detect via the responseType property, but the data itself isn’t much use to us unless we know how to interpret it. That’s where our codec lookup table comes in handy.

Finally, we find ourselves leveraging window.postMessage to pass the ill-gotten gains back to our home turf. However…

Be careful! There’s a major performance bottleneck to take into consideration.

Without transferable objects you’ll find yourself walking straight into Mordor if performance is “the precious.”

One does not simply walk out of an <iframe> with a reference to a huge buffer of video data!

No. You’ll need to pass a pointer. You might be thinking, “What the hell? This isn’t C. Pointers in JavaScript?” Yup. Modern browsers now support tranferable objects. This allows for 0-byte transfers of data through window.postMessage.

Caveat: Once a transferable object’s pointer has been de-referenced it is no longer accessible from its origin. As such, instead of transferring the original buffer of data, make a copy of it and pass that around.

Finally, our long running background script squirrels away its newly found video data so that it’s ready for our pop-up when necessary.