A movie recommendation system using collaborative filtering and content based filtering. In this project, we use the MovieLens Dataset and follow the tutorial here.
The Movie Lens dataset can be downloaded from GroupLen website. There are several sizes of the dataset available. However, given the compute and storage constraints of my local machine, the 25 million entries version was download.
Exploratory data analysis was conducted to understand the distribution of movie ratings, the average rating per movie, and to identify the most and least rated movies.
The dataset contains a total of 100,836 ratings. There are 9,742 unique movie IDs and 610 unique users. On average, each user has provided 165.3 ratings, while each movie has received an average of 10.35 ratings. The mean global rating is 3.5, and the average rating per user is 3.66.
The most active user rated 2,698 movies, while the least active user rated 20 movies. The most rated movie has 329 ratings, and the least rated movie has 1 rating.
Drama is the most common genre, with over 4,000 movies.
Comedy follows, with a slightly lower frequency. War, Musical, Western, IMAX, and Film-Noir have the fewest movies.
Collaborative filtering works on the premise that similar users like similar movies. Here, we transform the movie ratings data into a user-movie matrix, known as the utility matrix. In the utility matrix, rows represent users, columns represent movies, and the matrix items
Open the notebook to read the code and how the recommendations system works using unsupervised K Nearest Neighbours and Cosine similarity.
Building a MovieLens Recommender System MovieLense Data source