How does Spotify's recommendation system work?

Ever clicked on a song that ended up blowing your mind? How does Spotify know your musical tastes so well? Read on to find out more about how Spotify recommends songs.
Written on 
Jan 5, 2023
in 
Applications

How does Spotify's recommendation system work?

Ever clicked on a song that ended up blowing your mind? How does Spotify know your musical tastes so well? Read on to find out more about how Spotify recommends songs.
Written on 
Jan 5, 2023
in 
Applications

How does Spotify's recommendation system work?

Ever clicked on a song that ended up blowing your mind? How does Spotify know your musical tastes so well? Read on to find out more about how Spotify recommends songs.
Written on 
Jan 5, 2023
in 
Applications

How does Spotify's recommendation system work?

Ever clicked on a song that ended up blowing your mind? How does Spotify know your musical tastes so well? Read on to find out more about how Spotify recommends songs.
Written on 
Jan 5, 2023
in 
Applications
Play video
Written on 
Jan 5, 2023
in 
Applications

In 2000, psychologists Sheena Iyengar and Mark Lepper proposed a study known as ‘The Jam Experiment’ in their research paper When Choice is Demotivating.

“On one day, shoppers at an upscale food market saw a display table with 24 varieties of gourmet jam. Those who sampled the spreads received a coupon for $1 off any jam. On another day, shoppers saw a similar table, except that only six varieties of the jam were on display. The large display attracted more interest than the small one. But when the time came to purchase, people who saw the large display were one-tenth as likely to buy as people who saw the small display.”

What does that show? Although having a lot of choices is appealing, it can be confusing for customers.

With the rise in popularity of platforms such as Youtube, Spotify and Netflix there is an astonishing amount of multimedia content being uploaded to the internet every day. As of May 2019, more than 500 hours of video were uploaded to YouTube every minute![1].  Access to millions of items without an effective system to help people choose what they want might do more harm than good.

So what is this “effective system”?

Didn’t we all have that one friend we would go to when we wanted to buy, say a laptop? Another friend when we wanted fashion advice? Today, automated recommendation systems are that friend. From Spotify to Amazon, recommendation systems provide users with quality and personalized recommendations.

Music recommendation, in particular, poses some interesting challenges due to the number of diverse genres available and the tendency of users to consume music sequentially. Furthermore, the relatively short duration of music in contrast to film or books makes analyzing audio challenging. However, this also means that songs are “disposable”, lowering the penalty for a bad recommendation.

Currently, music streaming giant Spotify has 286 million active users, 50 million tracks and over 4 billion playlists[2]. One of the reasons why Spotify is a big hit among other online music streaming platforms is the “Discover Weekly” playlist. Every Monday, Spotify gives its millions of users 30 new song recommendations.

Spotify’s recommendations are mostly governed by an AI system called ‘Bandits for Recommendations as Treatments’ or simply known as BaRT as seen in the article How Spotify's Algorithm Manages To Find Your Inner Groove

But how does Spotify manage to recommend you that perfect song? 

Tony Jebara, VP of Engineering and Head of Machine Learning at Spotify explained their framework as a balance of exploration and exploitation in his keynote at TensorFlow World in Santa Clara, California. Exploitation means providing recommendations that are based on previous listening habits. Exploration, on the other hand, is based on uncertain user engagement and is used more so as a research tool to learn more about how people interact with suggested content.

Recommendation systems can be split into two different classes: collaborative filtering and content-based filtering. Spotify uses both these algorithms, a hybrid recommender system, to give you that familiar but still fresh playlist. Spotify also uses Natural Language Processing (NLP) to analyze news, articles and blogs written on the web about specific songs or artists.

Let’s understand what these mean.

Collaborative Filtering

Imagine you are at an office party. You run into John, the HR guy. You start a conversation about your musical interests and you find out that John had listened to songs A, B, C and D this week. It just so happens that you like songs B,C,D and E. You realise that the both of you have the same musical taste and so you decide to listen to song A. In turn, you tell John to listen to song E. This is exactly how collaborative filtering works!

Collaborative filtering; recommendation system python
An example of collaborative filtering


You may have already seen this system in Amazon’s “users who bought this have also bought” and Netflix’s “you may also like” recommendations.

Amazon makes heavy use of an item-to-item collaborative filtering approach

But how does Spotify do it for their 286 million users? It’s just matrix manipulation!
Let’s get down to the math.

Spotify uses user-song play counts as input data. This data is organized into a sparse matrix.

The entries in the rating matrix R represent the number of times a user (row) has listened to a song (column)

This matrix, called the rating matrix ($R$) is transformed into two matrices, the preference matrix ($P$) and the confidence matrix ($C$).

Let the rating matrix R have elements $r_{ui}$ denoting the play count for user $u$  and song $i$.

The preference matrix P has elements $p_{ui}$ . The preference variable indicates whether user $u$ has ever listened to song $i$ and is calculated as follows:
$p_{ui}= \begin{cases}    1,& \text{if } r_{ui}\geq 1\\    0,& \text{if } r_{ui} =  0\\\end{cases}$

This means if  $p_{ui}$ has a value of 1, the user has listened to this song. If it has a value of 0, the user has not streamed this song.

The confidence matrix $C$ has elements  $c_{ui}$. The confidence variable measures how certain we are about this particular preference. It is a function of the play count because songs with higher play counts are more likely to be preferred. If the song has never been played, the confidence variable will have a low value.

$c_{ui}= 1 + \alpha \log(1 + \frac{r_{ui}}{\epsilon})$

where $I(x)$ is the indicator function, $\alpha$ and $\epsilon$ are hyperparameters.

The matrix $R$ is factorized into the product of two matrices ($X$ and $Y$) using the weighted matrix factorization (WMF) algorithm. 

how to improve recommendations
Weighted Matrix factorization algorithm (Source: Introduction to Recommender Systems in 2019)

The rows of matrix $X$ ($x_{u}$) and the columns of matrix $Y$ ($y_{i}$) denote the latent factor representations of users and songs. 

The latent factors are found by minimizing the objective function given by

$$\displaystyle min_{x,y} \sum_{u,i} c_{ui}(p_{ui}-x_{u}^{T}y_{i})^2 + \lambda ( \displaystyle \sum_{u} {\lVert x_{u}\lVert}^2 + \sum_{i} {\lVert y_{i}\lVert}^2))$$
where λ is a regularization parameter. It consists of a confidence-weighted mean squared error term and an L2 regularization term. Alternating least squares is used for optimization.

Recommendations for each user are made by finding the ‘K’ closest song vectors for every user vector, using the approximate nearest neighbour algorithm. Similarly, similar songs can be found by using the same algorithm on every song vector.

To summarize, collaborative filtering analyses both your behaviour and the behaviour of others to see if you have similar tastes.

Content based filtering

Collaborative filtering seems to work well. But how do people find a song that hasn't been streamed before?

This problem is known as the “cold start problem”. The cold start problem is a situation where new songs fail to get recommended due to a lack of listening data. Moreover, because collaborative filtering relies on listening data alone, popular songs tend to monopolize recommendations.

content based recommendation system
Song 4 fails to get recommended 


But how do we overcome this problem? By processing the song itself! Raw audio, however, is difficult to analyse, so a spectrogram is used instead. If you could take a picture of music, it would be a spectrogram!

An example of a spectrogram (Source: Spectrogram)

Spotify processes this raw audio by converting it to a mel spectrogram and passing it through a convolutional neural network (CNN). A mel spectrogram is a time-frequency representation of the audio, where the frequencies are converted to the mel scale. The mel scale is a nonlinear scale that closely approximated the human hearing response.

Here is an example of what the architecture may look like. This architecture comprises four convolutional layers and three fully-connected layers.


An example of a CNN used for content based recommendation (Source: Recommending music on Spotify with deep learning)

After the spectrogram passes through this network, it spits out an understanding of the song, including characteristics like estimated time signature, key, mode, tempo, and loudness.

So, when a new song is found to have similar parameters to other songs you like, Spotify adds it to your playlist.

Natural Language Processing

Spotify also employs Natural Language Processing (NLP). In 2014, Spotify acquired Echo nest. Although we don’t exactly know how Spotify implements these models, we can assume that it is similar to the techniques used by Echo Nest.

NLP models deserve a whole different article itself, but here is a high-level overview of what happens:

These models analyse news articles, blogs and online reviews to compile a list of the most frequently used descriptors for a particular song or artist.
Each of these descriptors, called “cultural vectors” is associated with a weight that quantifies its relative importance for a given song or artist.

For example, here are the cultural vectors used to describe the Swedish pop group ‘Abba’:


Top words used to describe ‘Abba’ (Source: How Does Spotify Know You So Well? | by Sophia Ciocca)

Similar to collaborative filtering, these top terms are used to find commonalities between artists, songs and user preferences.

With access to massive amounts of data that they collect from their users, Spotify has been able to use these machine learning models to recommend thirty fresh, but familiar songs every week.

But this is just the beginning of the intersection between music and AI. In the past 50 years, the field of music intelligence has grown to include even music composition with IBM’s Watson Beat and Open AI’s Jukebox!

Written by

Related content