This project aims to build a predictive model that predicts the lifetime revenue of a American movie.
The project built a model Multilayer-Perceptron, that is able to predict the lifetime revenue of a modern American movie within a 20% error margin 74.2% of times.
To build the model, the project constructed a dataset of American movies produced between 1990 and 2016 that include features: budget, runtime, NYT critics pick, review sentiment polarity, genres, mpaa_rating, cast_score, and director_score. Other models including OLS linear regression, regression tree, and k-Nearest-Neighbours were exmained and trained, but did not achieve better result than baseline.
The python notebook contains the full report and all the related python codes including: data scraping, data cleaning, data transformation, and modelling.