“A library is not information; it is a means of preserving information. In every case, before memory or information can be stored, someone must decide what must be stored. Someone must choose. Someone must curate.” – John Scalzi

To deem curation necessary, one must feel a pressing desire to consume and experience things while still quite aware of one’s own mortality and limitedness. Sylvia Plath famously embodied that struggle, “I can never read all the books I want; I can never be all the people I want and live all the lives I want. I can never train myself in all the skills I want. And why do I want? I want to live and feel all the shades, tones and variations of mental and physical experience possible in my life. And I am horribly limited“. She was paralyzed by breadth of choice and the exclusivity of commitment. She did not simply want the best, she wanted all: to compare, contrast, and appreciate.

The struggle of being mortal

Objects of art float about us in surreal post-scarcity that is not too unlike the exhaustiveness depicted in the Library of Babel: with immediate access to more literature, film, music, games, and art than we could ever possibly consume in a lifetime… and a lot of it is trash (with our mortality in mind, of course). Backlogs plague our generation, and inaccessibility is no longer relevant when contrasted with the limitedness of our own existence and capacity to consume. Because we live in such a world, curation becomes the only antidote to analysis paralysis and post-consumption dissonance.

The first step is to accept the impossibility of experiencing it all, and that exhaustiveness cannot be our goal. Where do we go from here, then? How do we make ‘the best choice’ when we are unable to delegate our constantly developing preferences to an entity with a seemingly infinite capacity? And do we necessarily want to only experience ‘the best’? Is our goal to deepen our preferences or deviate from them? What do we optimize for: Exploration? Diversity? Depth? Smooth, prolonged trajectories? These are all questions that afflict modern curators as much as they afflict the software developers working on curation algorithms.

“This much is already known: for every sensible line of straightforward statement, there are leagues of senseless cacophonies, verbal jumbles and incoherences.” – Jorge Luis Borges (The Library of Babel)

The birth of the cabinet of curiosities or wonder (Kunstkabinett or Wunderkammer) perfectly illustrates the moment certain individuals decided they were overwhelmed by the abundance of objects. The inception of the ‘special room’, where items worthy of exceptional attention are on display. The shelves symbolized not only an itch to collect and organize, but to choose and highlight.

An 18th-century illustration of a wunderkammer (cabinet of wonder) room

The curation process that materialized Wunderkammers itself was not necessarily about features of an object, but a literal at-glance reflection of the preferences and interests of the curator. You are invited to marvel and appreciate, not to find an object of personal interest to you. As curators grew and developed niches, they began recruiting new pieces into their collection based on features. What was organic, became systematic, methodical and in line with a visitor’s expectations. This transition birthed ‘the museum’: Curation became less of a vanity project, and more about being able to cater to an audience. The journey from ‘personal preference’, to ‘catering to others’ occurs often: what was once a playlist crafted over time by an individual who is expressing their preferences is being overtaken by human-supported algorithms that cater to others.

Labeling objects has always served the marginal role of aiding choice: if we cannot make up our minds on whether to dive into a music album or not, surely we can make use of a label to infer with a level of certainty whether it is a good use of our time. But, we can do better.

Popularity is not a useful feature (sometimes)

Visiting Billboard’s archive of charts, you will find that they have music charts dating back to the 1950’s. In fact, it all started earlier than that: in the summer of 1940, Billboard magazine published its first comprehensive Music Popularity Chart (you can find scans of old copies on American Radio History. I find myself browsing through them from time to time). I love dissecting its contents because a lot of it aligns with a large part of music streaming feeds as they are today.

Some things changed, some things did not

Notice how this chart is not labeled with anything other than ‘popularity’, made simply by aggregating counts from several locations and radio stations. Granted, I am not a fan of such charts for music exploration. Not because I believe the chart contains bad songs, per se. Salganik, Dodds, and Watts (2006) conducted an experimental study that showed that under circumstances where listeners have access to song popularity before they listen to it and rank it, both the inequality and unpredictability of popularity gets more severe. Social proof is at work, and at some point, a listener relaxes their judgement and trusts society. It is how you save mental energy. However, it is worth noting that even with social proof the very ‘best’ songs became popular, and the ‘worst’ songs mostly ended up at the bottom, but most music lies in the middle, and this is where it goes wrong. Regardless of how a song reaches the status of ‘popular’, using popularity alone to recommend music will only let inequality persist: ‘hidden gems’ will become even more buried.

“Discovery can feel like work, and we wanted it to feel very human and natural like the selections that are powering it.” – Matt Ogle (Discover Weekly | Spotify)

What (a lot of) adults should envy in (most of) the young is their dedication and patience to indiscriminately consume (either from an already curated list, or randomly) with minimal prejudice or filtering. Funny enough, this is the most essential step towards becoming a ‘curator’; without it, a person will continue relying on social proof. In a sense, this process mirrors “training data collection“, without it, you have to rely on someone else’s data and subsequent model to cater to your own yet undiscovered preferences. To curate, is to care immensely about context.

Example of explicitly stated ‘taste’

Recommendation tools are not immune to cold start problems. Quantifying taste and preferences is insanely difficult, sometimes downright impossible. If you want to experience this first hand, contrast the reactions you get from asking people “what is your favorite film and how much do you like it?”, versus “Do you like [film 1]? Would you rank it above or below [film 2]?” – The latter is significantly easier to answer.

Movielens has an intuitive and smart way of overcoming the cold start problem

But, we don’t need explicitly stated truth when we are assessing something as dynamic and amorphous as taste. A user might be genuine when they say they enjoy avant garde jazz more than they love electropop, but their feed is more playable to them when we assume they prefer electropop and recommend more of it (I would argue that optimizing for enjoyability versus, say, to aid exploration, is an easier task. Trying to optimize for things that are mind expanding is trickier, despite a similar dopamine rush, because the signals are not recorded implicitly). Generally, most of the effort in building recommendation tools are trying to overcome two issues: 1) how to make expressing a user’s preferences implicit, without having them explicitly state their preferences. 2) how to create events that will push a user to continue updating their preference set (do they like this song?) and values (how much do they like it?) as they consume more media.

Despite algorithmic recommendation being popular in many domains (film, books, retail), the nature of it in music is particularly elusive and tricky due to the diversity of context around it. The same user, within the same day, can go from anywhere to actively listening to multiple different genres for a cerebral experience, to letting someone else ‘pick the next song’, to enhancing their activities (work out, party, work), to simply letting ‘something play through’. These scenarios have to be observed by people first, before developers can set flags to detect what the user is doing by their listening pattern only. You are not only supposed to personalize, but to be clairvoyant. There are several ingredients essential to good, effective music curation, and they either fall under ‘context-based features’, or ‘content-based features’.

From clever amalgamation of data points, you can de-compartmentalize and make meaningful statements such as: “User [x] prefers hip-hop on the weekends, they know a lot about post-rock given their knowledge of less popular bands in that genre, and enjoy what is on the periphery of jazz. They are fans of listening to albums when they are driving, versus singles, and they have a soft spot for songs with Spanish lyrics. They are open to exploring and love variety, but generally avoid high tempo music.” This is the power that content and context bring when used together. If you feel like there is something vaguely dirty about not taking into consideration what a user says about themselves, and going off their listening history and likes/dislikes only, that’s because there is. Isn’t it time to consider that certain users would like to control the trajectory of their taste and subsequent music discovery directly?

Giving back to users: provide access to discovery tools

As music streaming services (that have almost identical libraries, price, and quality) took over, the only reason a user would pick one over the other is how well a service will help them make a choice. Either by reading into their existing interests, or desired future trajectories. I am not of the naive opinion that algorithms only amplify popularity, there is no doubt that music discovery is now enhanced for the majority of people. I read comments every now and then on how algorithms are destroying our ability to discover music on our own, and how the long tail of music is not happening. Obscurity will always exist, and in the past, discovery was only taken up by professionals or people who felt especially passionate about systematically searching and listening to mediocre work in the hopes of one day finding something worthy. While the majority of people – who, by the way, were already interested in music and don’t just tune to the radio or top 100 – waited for those findings to be aggregated and uploaded to some list, (hello, Blalock’s indie rock playlist). So, I will not boldly say that a world without algorithmic curation is better or more fair to artists (listen count wise, financial gain aside) than a world without it. Now that I have clarified my stance, I feel safe in suggesting that there is much room for improvement when it comes to supporting a listener’s music discovery journey besides pushing lists of songs to their feed.

It is not hard to see that there is a problem with how music recommendation is done today, by giving you more of what you enjoy based on what you enjoyed based on what was recommended to you. It’s only a matter of time till this becomes a closed loop system, and then… uh oh. There is a real chance that this will result in an eventual amputation of ‘organic’ input from other systems, and this is worrisome.

“We should get good enough that you don’t have to take your phone out of your pocket to get the right stuff.” – Jim Lucchese (Echo Nest)

While I agree with the vision of there being no more frustration to get to ‘the right stuff’, I am uncomfortable with the ultimate goal being to discredit the ‘choice’ of a user, even if not blatantly. Yes, your past choices technically dictate your future recommendations, but where does your future self come in? Music exploration and discovery should not feel like frustrating work, and should not be bypassed either. It’s not a responsibility; it’s an extension of pleasure.

My approximation of the process that ultimately ends in algorithmic curation

The biggest change I am hoping for is to move away from delivering an ‘end product’ that is based on a user’s action, and to give users a voice again. To no longer say “hey, I know you better than you know yourself because that is what you currently do“, but “hey, it seems like your listening habits shape our recommendations to you – so why don’t you take charge for a bit should you wish? Here is this amazing discovery tool where you could navigate your universe.” This is the future of personalization.

The next leap in music discovery will only be made by empowering listeners

Regardless of how this will change and shape art, I look forward to the change. Yet, I hope a sophisticated discovery tool never becomes the only method for music exploration. Discovery, by nature, implies going different places, and embracing chaos, and to have one place assert algorithmic control over what you listen to (no matter how gentle that control is, and no matter how much it tries placing your taste profile into ‘buckets’) is problematic. We need people to continue doing step 1, and thanks to the subsequent steps: the days of limited navigability through this giant library will soon be behind us.