Why Netflix Never Implemented The Algorithm That Won The Netflix $1 Million Challenge

from the times-change dept

A year into the competition, the Korbell team won the first Progress Prize with an 8.43% improvement. They reported more than 2000 hours of work in order to come up with the final combination of 107 algorithms that gave them this prize. And, they gave us the source code. We looked at the two underlying algorithms with the best performance in the ensemble: Matrix Factorization (which the community generally called SVD, Singular Value Decomposition) and Restricted Boltzmann Machines (RBM). SVD by itself provided a 0.8914 RMSE (root mean squared error), while RBM alone provided a competitive but slightly worse 0.8990 RMSE. A linear blend of these two reduced the error to 0.88. To put these algorithms to use, we had to work to overcome some limitations, for instance that they were built to handle 100 million ratings, instead of the more than 5 billion that we have, and that they were not built to adapt as members added more ratings. But once we overcame those challenges, we put the two algorithms into production, where they are still used as part of our recommendation engine.

We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.

One of the reasons our focus in the recommendation algorithms has changed is because Netflix as a whole has changed dramatically in the last few years. Netflix launched an instant streaming service in 2007, one year after the Netflix Prize began. Streaming has not only changed the way our members interact with the service, but also the type of data available to use in our algorithms. For DVDs our goal is to help people fill their queue with titles to receive in the mail over the coming days and weeks; selection is distant in time from viewing, people select carefully because exchanging a DVD for another takes more than a day, and we get no feedback during viewing. For streaming members are looking for something great to watch right now; they can sample a few videos before settling on one, they can consume several in one session, and we can observe viewing statistics such as whether a video was watched fully or only partially.

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community. Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis. While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

You probably recall all the excitement that went around when a group finally won the big Netflix $1 million prize in 2009, improving Netflix's recommendation algorithm by 10%. But what you mightknow, is that. Netflix recently put up a blog post discussing some of the details of its recommendation system , which (as an aside) explains why the winning entry never was used. First, they note that theymake use of an earlier bit of code that came out of the contest:Neat. But the winning prize? Eh... just not worth it:It wasn't just that the improvement was marginal, but that Netflix's business hadand the way customers used its product, and the kinds of recommendations the company had done, had shifted too. Suddenly, the prize winning solution just wasn't that useful -- in part because many people werevideos rather than renting DVDs -- and it turns out that the recommendation for streaming videosthan for rental viewing a few days later.The viewing data obviously makes a huge difference, but I also find it interesting that there's a clear distinction in the kinds of recommendations people that work if people are going to "watch now" vs. "watch in the future." I think this is an issue that Netflix probably has faced on the DVD side for years: when people rent a movie that won't arrive for a few days, they're making a bet on what they want at some future point. And, people tend to have a more... optimistic viewpoint of their future selves. That is, they may be willing to rent, say, an "artsy" movie that won't show up for a few days, feeling that they'll be in the mood to watch it a few days (weeks?) in the future, knowing they're not in the mood immediately. But when the choice is immediate, they deal with their present selves, and that choice can be quite different. It would be great if Netflix revealed a bit more about those differences, but it is already interesting to see that the shift from delayed gratification to instant gratification clearly makes a difference in the kinds of recommendations that work for people.

Filed Under: contest, data, recommendation algorithm, streaming

Companies: netflix