This is a Mini-project for SC1015 (Introduction to Data Science & AI) utilizing a HDB resale dataset from Kaggle. Throughout this project, we will be using Jupyter notebook under Anaconda environment. We aim to conduct a comprehensive analysis to identify the key factors influencing resale prices of HDB flats and our analysis will seek to answer the following questions in problem statements.
- Which variables has the greatest influence in HDB resale price?
- Which model would be ideal for HDB resale price prediction?
- How are HDB resales distributed across Singapore?
- Pandas
- Seaborn
- Matplotlib
- Sklearn
- Folium
- Bar plot
- Box plot
- Violin plot
- Scatter plot
- Time series plot
- Correlation heatmap
- Interactive map of Singapore (in notebook + as html)
- Linear regression
- Ridge regression
- Bayesian ridge regression
- Gradient boosting regressor
- Random forest regressor
- Floor area has the greatest influence in resale price of HDB flats.
- Ridge and Bayesian ridge regression model did not improve the linear regression model.
- Random Forest regressor model performed well in predicting the resale price.
- The interactive map shows that most resale occurs on the outskirts of Singapore, and resales with the highest prices occur around the central region of Singapore.
Singh Janhavee - Data extraction, Data cleaning, Data visualisation, Gradient boosting regression
Vannes Wijaya - Data cleaning, Data visualisation, Random forest regression, Singapore map
Thwun Thiri Thu - Data visualisation, Linear regression, Ridge regression, Bayesian ridge regression
https://www.kaggle.com/datasets/teyang/singapore-hdb-flat-resale-prices-19902020
https://www.analyticsvidhya.com/blog/2022/04/bayesian-approach-to-regression-analysis-with-python/
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html
https://towardsdatascience.com/random-forest-regression-5f605132d19d
https://blog.prototypr.io/interactive-maps-with-python-part-1-aa1563dbe5a9