This repository contains the code and the report for a data mining project focused on exploring advanced preprocessing techniques, classification algorithms, and regression algorithms for analyzing data. The project delves into various aspects such as feature selection, outlier detection, imbalance learning, classification, and regression, with a final emphasis on explainability.
- Data Preparation: detailed exploration of various preprocessing methods including feature selection, outlier detection (several families of methods), and imbalance learning techniques (both undersampling and oversampling)
- Classification and regression: implementation and evaluation of advanced classification/regression algorithms including SVM, RandomForest, XGBoost, Logistic Regression and others.
- Time Series Analysis: in this part of the project the dataset comprises time series data extracted from audio file. Here we worked on the implementation and evaluation of advanced classification algorithms such as ROCKET (Randomized Convolutional Kernel Transform), K-Nearest Neighbors (KNN), and Shapelets for accurate classification of audio-derived time series data. Moreover, we worked on Clustering Techniques using different distance metrics and Motifs and Discords Discovery.
- Explainability: the final part of the project focuses on enhancing the interpretability of the models developed throughout the project, aiming to provide insights into the decisions made by the models and their underlying mechanisms.