In the recommendation system, a common function is to find similar movies or users by making use of the ALS results. With ALS matrix factorization, we can easily achieve this by using the latent factors it derives.
The problem is which similarity metric to choose: there’re several popular ways/methods to compute the similarity, and among which the two most common methods are cosine similarity and euclidean distance.
Here are the mathematical definitions of each:
Cosine similarity measures the angle between the two vectors:
Euclidean distance measures the distance of two points:
In case you didn’t know, I built a movie recommendation engine based on the ALS model in Apache Spark (GitHub link) and added my ratings of movies to get recommendations for me.
However, the returned results are somehow unexpected or unsatisfying, as none of the top 20 recommendations seems attractive to me.
So what’s wrong? I was kind of frustrated at the result and decided to figure out what happened. For your reference,
Here’s my input:
Here are the top 20 recommendations I got: