Introduction

As a professional social network serving more than 500 million worldwide members, LinkedIn is the premier destination for professional conversations. We have a wide variety of posts that attract significant engagement, and some of these posts go viral. These posts attract likes and comments in large numbers. In many cases, comments on these popular threads became so numerous that their value was getting lost in the noise.

To surface comments valuable to LinkedIn members, we’ve built a scalable comment ranking system that uses machine learning (ML) to provide a personalized conversational experience to each member visiting the LinkedIn content ecosystem. This blog post details our design, the scalability challenges that we faced and overcame, and the tight latency budgets that the system operates under.

History

The content on the LinkedIn feed is rich, diverse, and widely consumed. Some of it is generated organically by our members (e.g., the post shown above, authored by LinkedIn founder Reid Hoffman in response to a news article), and some of it is from third-party websites that regularly post high-quality content for distribution. These articles draw significant engagement from our members via comments and likes. Until recently, however, we lacked the ability to convert this engagement into meaningful conversations between people on the platform.

The default mode for ranking comments on the feed was rank by recency: if you were the last person to post a comment on a popular thread, your comment would show up first. We had no understanding of the comment’s content, no notion of personalization, and no knowledge of the engagement that these comments were drawing.

This problem garnered our attention in mid-2016. We set up a simple minimum-viable-product (MVP) that tried to rank comments by the number of aggregated likes they gathered (using that number as a simple proxy for comment quality). The MVP was successful in that it demonstrated that ranked comments had value. However, it also demonstrated the weaknesses of relying on a single, non-personalized, after-the-fact feature: comments would be highly-ranked only after they had garnered enough social proof. Ideally, we wanted to pick out comments that were engaging ahead of time. In addition, latency and scaling concerns prevented us from further productionizing the system.