Every quarter at Globo.com, the whole company stops to promote a “Hackday”. Basically, 24 hours that we can free our minds from daily problems and code that cool idea we had on that beautiful Monday afternoon.

This quarter we created a POC (proof of concept) of a recommendation system based on dynamic attributes of videos.

We’re not big data boys

Understanding simple, but yet abstract concepts behind user interactions with our systems, allowed us to identify something obvious:

It is very hard to provide good video recommendation content!

Who would have guessed, right? It’s obvious. There are people all around the world studying ways to create nice approaches on recommending content.

But one thing called us out, most of the systems that we noticed were based on static attributes to generate recommendation content. Even though we’re not big data developers, we were able to identify that in some cases it just is not enough.

Static vs Dynamic

We based our project on the concept that current video recommendation systems are all based on static attributes only, such as title, description, duration, genre, timestamps… but the question is:

What do all those attributes are actually providing?

We believe that they provide just the scratch of what the video is really about.

A video, by design, is a time-based sequence of images combined in a specific order, bound with sound. Its goal is to capture a moment with the highest integrity available.

It is dynamic by design, since it is constantly changing over time. That makes it very hard to describe well enough based only on static attributes, because as soon as the video starts it is possible that the current recommendation becomes obsolete, once the video has the “ability” to change its subject without notice.

You may think it is nonsense, because it is possible to write a very detailed description and capture the essence of the video in a text fragment, and that is entirely correct. But, our current context produces over 300 GB of video per day and we have around 5M videos! Imagine producing full detailed descriptions to all those videos, it would be a huge demand.

Then we asked ourselves:

Do we need full detailed text descriptions?

And the proper answer is a big NOPE! What we really need is good recommendations, besides, who does read full detailed video descriptions? Once you’ve watched the video, what is the point?

Being smart about our needs allow us to create beautiful solutions on performing this kind of handwork.

Scenario

Imagine yourself watching a talk show, you have only 5 recommendations available. As soon as the video starts, probably the recommendations will be just some random few episodes of that same show.

What if in this video, the host introduces an interesting guest? Maybe the interview was a little bit short, you feel the need to know more about some specific subject, or even about the guest.

The natural flow would be to open another tab on your browser and search, meaning that basically, we lost you.

Why can’t we provide you more information about this specific subject as recommendation content?

Goals vs Hackday

Taking our beliefs as facts, our goal was to prove the concept in the given time. We had to prove that it is possible to generate suitable recommendation content, based only on any dynamic attribute analysis.

Up to that point, the question was: how?

Of course, we thought about going all the way down to image analysis, deep learning, and such. Though it is pretty hard-core stuff, it would require a long time and a large set of skills and tools. Since we had only 24 hours to create and present something usable, we decided to use the most simple dynamic attribute known: subtitles.

Hard work

In order to understand all the subjects, filter content, extract usable semantic entities from the subtitles, we used an external service called AlchemyAPI to help us.

As their welcome email says:

We see Alchemists around the world create amazing apps with our REST APIs every day. Get inspiration from fellow hackers who let their imaginations run wild.

AlchemyAPI receives over a REST API, inputs of text and it executes complex analysis in order to extract machine usable content, like tags, subjects, entities.

As far as we know, it learns more and over every time we input content.

Currently, it is already smart enough to identify the topic of a text, provide tags, as well as understand that “Israel” is a country or “Hillary Clinton” is a person.

Proof of Concept

Our initial prototype was based on time, every 30 seconds of video, we were analyzing the closed caption content, extracting something usable and matching other videos in our database to give it back dynamic content to the user as recommendation links.

As a stable and successful example, we’re using a "news program". Since its subject changes a lot and varies between several common kinds of topics.

It is possible to see that the subject of the current section of the video, around 00:47, is related to the middle east. Fortunately, our recommendation suggestions are particularly related to the main subject. With one exception of 5 recommendations, we could generate related content. So far it is working!

As we previously foresee, over time, the recommendations should constantly change, based on what is present on the current section of the same video.

The next section, around 01:48 is about american presidential elections:

As expected, the recommendations were pretty much suitable for the content. They are all about related topics, such as Hillary, democrats, candidates, the debate…

Unfortunately, we couldn’t test this in production to have better data on user interactions, but yet we proved that it’s possible to analyse a closed caption and extract machine readable content to generate the dynamic recommendation.