We recently caught up with Kevin Wong, a business intelligence professional and machine learning enthusiast, to talk about his latest project: Building a better recommender system for Steam.

For anyone who isn’t familiar, Steam is a digital distribution platform for games, developed by Valve Corporation. It boasts over 75 million members, with between 5 and 8 million online at any time, and estimated revenues of around $1.75 billion.

To put those figures in perspective, entertainment giant Netflix only had 50 million members at the end of July. Gabe Newell, MD at Valve, has claimed the company is more profitable per head than either Apple or Google. Clearly, Steam is a force to be reckoned with, so how can one man hope to improve on their product?

Hi Kevin, tell us a bit about yourself, and how you get into Machine Learning.

I am currently a Tableau consultant, and I’m well trained in visual analytics principles and data analysis. I realised there’s much more that can be done to unlock data’s full potential and I am passionate about it, so I enrolled Data Science Retreat, a 3-month data science bootcamp in Berlin. That’s how I got into Machine Learning, amongst other areas such as Hadoop and software engineering.

What led you to focusing on Steam for this project?

I like video games and I have been a Steam user for some time already. I really like using the service to purchase and download video games, without ordering them and having to wait for delivery. Having said that, I am always pretty frustrated by the recommendations that Steam has suggested for me, either through their list of “More Like This” games on their website, or the personalised suggestions after logging in.

I surveyed a few other Steam users and found that its a common sentiment. One member even told me he felt he really wanted to expand his library, but Steam actually holds him back from purchasing because of the poor recommendations.

How exactly does the recommender system work, and what technologies are you using?

Steam’s approach is to look at the similarity between the games you already own compared to other games in their library, based on some predefined attributes.

My recommender looks at patterns in game ownership and patterns in actual time played (i.e. whether users who spent a lot of time in game A also spend a lot of time in game B). For instance, if a lot of Counter Strike players also own Half Life, and they spend a lot of time on both games, Half Life will be one of the top recommendations for users who like Counter Strike.

For the technically minded, Log-Likelihood Ratios and Pearson R correlation are the two recommendation algorithms I have implemented with Python. The website mainly uses PHP and Javascript to interact with users and load recommendations.

Building recommender systems for third party services is an interesting idea. What advantages do you think it offers the consumer?

Well, it is not as interesting if Steam had come up with an excellent recommendation system in the first place! If Steam really puts effort into this, they have the potential to improve their recommendation engine by adding information such as user’s browsing histories, wishlists, and other personal details.

In contrast, a recommendation site built by a video game fan is impartial, as it wouldn’t be promoting games that may provide extra incentives to retailers. I also have the freedom to include features that fit the needs of my users, even though some features may be commercially undesirable. There is no incentive for non-commercial fan sites to store personal data, making it safe for users who have concerns regarding to privacy.

What were the technical challenges involved in developing an external recommender system?

The most challenging aspect is to obtain the data. Unlike Steam, I only have limited access to non-sensitive information for some users only. Since there is an API limit of 100,000 calls per day per Steam user, it really took time to download the data I need. Fortunately, a few of my friends loaned the API keys associated with their Steam accounts to the project.

It is also very difficult to test the recommendation system objectively, as there are technical limitations to compare my results against Steam’s based on external metrics such as Click-Through-Rates. I have therefore resorted to internal tests using a metric to measure recommendation precision based on a research paper in 2011 by G.Shani and A. Gunawardana. The “Precision at N” measure suggests my recommendations has a score of 45% compared to Steam’s 27%. This provides some evidence that my recommendation engine is better from a particular perspective, but more needs to be done to validate this via user experience.

What has been your feedback so far?

User feedback is positive, with some people who really likes the site and the recommendations. There are already some feature requests too, such as recommending games on a members backlog, i.e. the games that they have bought but didn’t get around to trying yet. I’m aiming to improve the site in my spare time based on the feedback I get.

How was your time at Data Science Retreat?

It has been a gruelling and challenging 3 months! I have definitely gained a lot out of it, not only the technical bits but also the business and communications aspects of data science as well. DSR isn’t going to transform me into a magician overnight, but I am confident that I can learn new techniques very quickly, building on the firm foundation and the broad set of skills I have developed during the bootcamp.

Thanks Kevin!

If you are a video game fan with a Steam account, feel free to check out Kevins’s project at NextVideoGame.com.

Follow @DataconomyMedia

(Image credit: VALVE Software)