This project uses machine learning to predict heart disease from a dataset (heart_disease.csv). The data was preprocessed with label encoding, imputation, and SMOTE for class balancing. Models like SVM, KNN, Random Forest, and Decision Tree were applied and evaluated using accuracy, confusion matrices, and classification reports.
Here is the details :
This project aims to predict heart disease using various machine learning algorithms, based on a dataset (heart_disease.csv
). The data was preprocessed, and multiple classifiers were applied to evaluate the best model for predicting heart disease.
-
Data Preprocessing:
- The dataset was cleaned by handling missing values using the
SimpleImputer
(mean imputation). - Categorical columns were encoded using
LabelEncoder
. - Duplicate rows were removed, and the dataset was reset.
- Class imbalance was handled using
SMOTE
to oversample the minority class.
- The dataset was cleaned by handling missing values using the
-
Exploratory Data Analysis (EDA):
- Various visualizations were created, including:
- Pie charts to show the distribution of heart disease based on gender and fasting blood sugar.
- Histograms to visualize feature distributions.
- A correlation heatmap to identify relationships between features.
- Various visualizations were created, including:
-
Feature Scaling:
- Min-Max scaling was applied to normalize the data.
-
Modeling:
- Several machine learning models were applied:
- Support Vector Machine (SVM): Tuned using
GridSearchCV
for optimal parameters. - K-Nearest Neighbors (KNN): Hyperparameters optimized using
GridSearchCV
and cross-validation. - Random Forest Classifier (RFC): Fitted with the training data.
- Decision Tree Classifier (DTC): Applied with the "entropy" criterion.
- Support Vector Machine (SVM): Tuned using
- Several machine learning models were applied:
-
Model Evaluation:
- Models were evaluated using accuracy scores and confusion matrices.
- Performance metrics (precision, recall, F1-score) were displayed using
classification_report
for each model.
-
Results:
- The SVM model, KNN, Random Forest, and Decision Tree all demonstrated good performance in predicting heart disease.
- Confusion matrices were visualized for each model to assess their ability to correctly classify heart disease cases.
The project successfully applied multiple machine learning models to predict heart disease and evaluated their performance. The data preprocessing, feature scaling, and class balancing techniques contributed to the models' effectiveness.
For questions or feedback, feel free to open an issue or reach out to NASO7Y.
Email: [email protected]
LinkedIn: LinkedIn