In this project, we propose the development of a regression model to predict the price of used cars, based on a a variety of relevant factors. By building a robust and accurate model, we hope to provide buyers and sellers with a tool to better estimate the value of a used car and delivery data-driven intelligence. We use a dataset of used car sales, including information on car features, mileage, and price, to train and test our regression model. We explore different types of regression models, such as linear regression, decision trees, and random forests, and evaluate their performance in predicting the prices of used cars. We also explore different techniques for feature selection and model optimization, in order to build the most accurate, efficient, and scalable model as much as possible. Overall, this project aims to contribute to the growing field of predictive analytics and to provide insights into the factors that affect used car prices. The project report can be found here.
- Kaggle US Used cars dataset (https://www.kaggle.com/datasets/ananaymital/us-used-cars-dataset)
- US cities (https://simplemaps.com/data/us-cities)
- US states boundaries (https://github.com/sunny2309/datasets)