Predicting Colorado forest cover types using diverse ML models for classification. Baseline creation, feature selection, comparison ,and tuning optimize accuracy on Forest Cover Type Prediction in this University of Ottawa Master's Machine Learning course final project (2023).
- Required libraries: scikit-learn, pandas, matplotlib.
- Execute cells in a Jupyter Notebook environment.
- The uploaded code has been executed and tested successfully within the Google Colab environment.
Task is to classify the Forest Cover Type Prediction dataset into seven types: Spruce/Fir, Lodgepole Pine, Ponderosa Pine, Cottonwood/Willow, Aspen, Douglas-fir, and Krummholz.
- 54 geographical Features include 'Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology', 'Vertical_Distance_To_Hydrology', 'Horizontal_Distance_To_Roadways', 'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm', 'Horizontal_Distance_To_Fire_Points', 'Wilderness_Area1' to 'Wilderness_Area4', and 'Soil_Type1' to 'Soil_Type40'.
- 'Cover Type' column represents the target with 7 classes
-
Problem’s Overview:
- Create a conceptual figure showcasing the end-to-end data flow.
- Illustrate insights into the problem through data flow visualization.
-
Dataset’s Overview (EDA):
-
General Flowchart:
-
Visualize Training and Test Sets:
-
Obtain Baseline Performance:
-
First Improvement Strategy: Feature Selection:
-
Implement feature selection methods, including
- Filter Selection Methods (Information Gain/Mutual Information , Feature Selection , Variance Threshold ,Chi-Square)
- Wrapper Selection Methods (Forward Feature Elimination- Backward Feature Elimination- Recursive Feature Elimination
-
Proceed with the best-performing feature subset and ML model for subsequent stages.
- Champion Model in Filter Selection: Information Gain
Maximum of Feature Selection-K-Nearest Neighbors: 73.96721311475409 Best number of n_components Feature Selection-K-Nearest Neighbors: 12 Maximum of Feature Selection-Decision Tree Classifier: 76.65573770491804 Best number of n_components Feature Selection-Decision Tree Classifier: 8
- Champion Model in Wrapper Selection: Recursive
Maximum of Recursive_FE-K-Nearest Neighbors: 73.96721311475409 Best number of n_components Recursive_FE-K-Nearest Neighbors: 12 Maximum of Recursive_FE-Decision Tree Classifier: 76.26229508196721 Best number of n_components Recursive_FE-Decision Tree Classifier: 10
- Champion Model in Filter Selection: Information Gain
-
-
Adding More Machine Learning Models: