Further analysis on movie recommendation result: what happened to my ALS model?

A complete record of how I try to figure out the unexpected results of my movie recommendations

Zishan Cheng
5 min readJun 14, 2021

In case you didn’t know, I built a movie recommendation engine based on the ALS model in Apache Spark (GitHub link) and added my ratings of movies to get recommendations for me.

However, the returned results are somehow unexpected or unsatisfying, as none of the top 20 recommendations seems attractive to me.

So what’s wrong? I was kind of frustrated at the result and decided to figure out what happened. For your reference,

Here’s my input:

my ratings as userId = 0

Here are the top 20 recommendations I got:

My first assumption was that there was something wrong during the process. After research and checking my code carefully, I concluded there was no explicit coding error or application error.

Then it must have to do with the model itself. There are three parameters to define during the training: seed, rank, and regParam.

Seed can be crossed out immediately because it’s just a random state for reproduce and shouldn’t affect the model’s performance.

Then there leaves the rank = 4 and regParam = 0.15, which I got from tuning with Cross-Validation. So was there something wrong with my CV process? For example, maybe my grid search didn’t include a better value to choose from.

Originally, I trained CV with grids (the circled ones were the best parameters chosen):

Now, I retrained CV with a new grid:

and still got the same result.

During my research, I found that a higher rank often leads to a smaller RMSE, and in some other people’s similar projects they had got their best rank = 8, 10, or 16.

I printed out CV RMSE from each training and noticed that my best RMSE is kind of high, around 0.8. But since it’s already the best RMSE I could ever get, I have to accept the rank = 4 and a higher RMSE than others.

the CV RMSE from the last grid search

However, there are still some other possibilities about the CV process itself that might be improved. In CV we set two parameters: k = number of folds, and the ratio of splitting the training and test set. Currently, I chose k=4 and 80/20 split. Could these be improved?

After research, I concluded that: 1) with such a large number of sample data, k = 4 is totally acceptable, considering computation efficiency (no need to set k = logN ~=12) and not resulting in high variance by training with too few data each time; 2) 80/20 split is totally fine as well and shouldn’t pose a critical influence on the model’s performance.

Also, it’s said to be supported by Spark ALS API that the best param on the sample subset should work similarly as on full data.

Well, after all the retraining and research on the CV process above, at least I knew that my CV process is correct. Then there left the ALS model itself.

According to my research and learning on the ALS method, one of the deficiencies is that it’s hard to interpret the recommendation and explain why it recommends in this way, so you can only accept the results without comments. Could this be my case? I decided to look at the raw features myself in an attempt to understand the result since I had only 4 latent factors.

Here’s my latent factor as userId = 0:

Here are latent factors of movies rated by me:

Here are latent factors of the top 20 movies recommended by ALS:

At first, I guessed the first factor might be related to the country/language of production since the only movie in my ratings with a negative value on that factor is the only movie that’s not in English. However, when I look at the recommendations, it turned out that Aci Ask (2009) is a Turkish movie with 1.90 on the first factor, while A Summer Story (1988) is a British movie but got -0.42.

Then I looked at the third factor, where my user’s feature most emphasized and complied with all my rated movies. When it comes to the recommendations, the engine seems to go a bit too far on the third factor, with most of them have their third factor at the scale of 2 or 3. (However that’s just how ALS works: using the dot product of user feature and movie features to get predictions.)

I couldn’t make any more sense besides those guesses. It then occurred to me that why not compare the model’s prediction on the movies I rated with my actual ratings?

RMSE = 0.7713415016737766
RMSE = 0.7713415016737766

Okay, at least we could finally see how the prediction makes errors. (Keep in mind that this is already the best choice of rank and regParam of my ALS, so this is the best try it could get.) I couldn’t increase RMSE simply by increasing the value of rank because we already know from the CV that higher ranks would only do worse in my case.

I felt like better understand my ALS model as a person who had already complied all the commands received, followed all the instructions given correctly, tried their best to perform and that’s what they could do for the best of themself. (In fact, it just like me after all the training process)

I guess I would reconcile with my ALS model and just respect its work (though I won’t agree with its recommendation at this moment). ❦︎

Ps: I’ve learned that there are some more advanced recommendation methods like adding bias or time series to the ALS, etc., which may improve the results. The analysis and reflection in this article are just to help me understand the basic ALS method and its deficiency much more clearly.

Comments and feedbacks are welcome! ;-)

--

--