Sign in

A complete analysis of finding similar movies in two methods with ALS results

In the recommendation system, a common function is to find similar movies or users by making use of the ALS results. With ALS matrix factorization, we can easily achieve this by using the latent factors it derives.

The problem is which similarity metric to choose: there’re several popular ways/methods to compute the similarity, and among which the two most common methods are cosine similarity and euclidean distance.

Here are the mathematical definitions of each:

Cosine similarity measures the angle between the two vectors:

cosine similarity

Euclidean distance measures the distance of two points:

A complete record of how I try to figure out the unexpected results of my movie recommendations

In case you didn’t know, I built a movie recommendation engine based on the ALS model in Apache Spark (GitHub link) and added my ratings of movies to get recommendations for me.

However, the returned results are somehow unexpected or unsatisfying, as none of the top 20 recommendations seems attractive to me.

So what’s wrong? I was kind of frustrated at the result and decided to figure out what happened. For your reference,

Here’s my input:

my ratings as userId = 0

Here are the top 20 recommendations I got:

Z. Cheng

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store