Why Serendipitous Discovery?

Imagine entering a candy store as a kid and the owner presenting you with a piece of candy. You give that candy a try and tell him whether you liked it or not and he gives you another piece to try. Each subsequent piece of candy you receive becomes even more aligned with your tastes. However, if you already know the particular candy you’re looking for before arriving, this approach is less than ideal. But if you don’t know what’s out there in that store, this approach makes much more sense.

There are Known Knowns

To be honest, I’m not a big fan of politics or the Iraq War back in the early 2000s, but this Donald Rumsfeld quote has profoundly altered the way I understood the limitations of my knowledge:

Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns — the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

Known knowns are truths that you already know that are true (like there are 24 hours in a day.) Known unknowns are things you are aware that you don’t know about or basically any gaps in knowledge you’re aware of (for example, you might be aware that you do not know anything about quantum physics.) Unknown knowns, although not mentioned in the quote, are things you think you “know” that are actually false, like believing the world is flat in the Dark Ages (an alternative interpretation can also be the things you instinctively know but aren’t aware that you know.) Finally, unknown unknowns are things you don’t know that you aren’t even aware of not knowing about.

Search engines are ideal for converting your known unknowns into known knowns assuming there is an answer to the question you’re wondering about. For unknown unknowns and unknown knowns, a different approach is needed since you won’t be consciously searching and seeking for these answers. At first, social media might sound like it could fill this void, though upon deeper examination, it’s still less than ideal. First, the types of information and links that you’ll find in your social media feed is often influenced by the preferences of the people you’re friends with or following. Second, one must actually click on the links in one’s social media feed and we all know that link titles that tend to be familiar or sensational tend to receive more clicks.

Almost by definition, there are no obvious ways for a user to discover information that falls in their domain of the unknown known or unknown unknown. However, there is a very easily implemented but flawed solution that I haven’t mentioned yet: a system that simply has a large database of website URLs and gives the user a URL at random each time. This, too, also has obvious drawbacks since it fails to take into account an individual user’s personal preferences for different types of subjects (imagine being served 5 websites on French ballet when you’re really interested in racing cars instead.) So Gimmeserendipity will rely on the user’s repeated feedback in the form of ratings to determine his/her reading preferences.

Gimmeserendipity doesn’t use Collaborative Filtering (as of 9/2019)

Collaborative filtering is a commonly used algorithm for making recommendations for new items (like books, videos, potential dating partners, etc.) by considering the correlations and differences in how other users rated existing items vs how the existing users rated the similar items. This is in contrast with content based filtering which makes recommendations for items with features or content similar to the ones the user rated highly. Most recommendation systems in practice use a combination of both but lean more towards collaborative filtering.

To understand what collaborative filtering accomplishes at a high level, picture yourself running a recommendation system and opening the master spreadsheet containing all of the users on one axis and all of the items being recommended on another and each cell contains how a particular user rated a particular product. Of course, not all users have rated all products and not all products have ratings from all users so some cells will be blank. Collaborative filtering will work to fill in estimates for all of the blank cells. Notice how I don’t need to have any domain knowledge of the problem at hand when using collaborative filtering, be it recommending books, music, pictures, etc. to a user, unlike content-based filtering. Unfortunately, this won’t work too well for a new recommendation service with very limited users and rating data (known as the cold-start problem.)

Gimmeserendipity currently relies on a content-based approach for making user recommendations, mainly to work around the cold-start problem. It has separate engines for recommending pictures and for websites that extract completely different features to input to its machine learning models. These models are also periodically retrained every time the user has rated several new websites or pictures. I had considered adding collaborative filtering to the recommendation engine after receiving a decent number of ratings from numerous users that have signed up in recent months, but I’ve decided to hold off on it for now for one major reason: feedback loops.

A Word about Feedback Loops, Echo Chambers, and Conspiracy Theories

The ugly side of collaborative filtering has started to make waves in the media in the late 2010s as services like Facebook, Twitter, Youtube, etc. have been accused of fueling conspiracy theories through their recommendations which use collaborative filtering. There are numerous theories behind how collaborative filtering creates echo chambers through feedback loops which, in the worse case, eventually lead to one’s feed being cluttered with content related to conspiracy theories and other extremist content. Content-based filtering isn’t immune to this effect either although it hasn’t been shown to exhibit it to such a degree. Creating an echo chamber is not the most ideal outcome especially since it’ll also defeat my mission to make the user aware of his/her unknown knowns and unknown unknowns. Feedback loops also tend to disproportionately attract traffic and attention to already popular and highly rated content, making it difficult for newly submitted content to reach a wide audience (hence Kevin won’t get much traffic to his newly submitted omelette recipe unless he’s really lucky since all of the popular submissions are sucking up all of the visitors.) Until further research on the link between collaborative filtering and feedback loops becomes available, collaborative filtering will not be considered.