Skip to content

this repository hosts our Data Mining project @ University of Pisa, where we explore advanced preprocessing and classification techniques

Notifications You must be signed in to change notification settings

vpicchianti/data_mining

Repository files navigation

Data Mining project

This repository contains the code and the report for a data mining project focused on exploring advanced preprocessing techniques, classification algorithms, and regression algorithms for analyzing data. The project delves into various aspects such as feature selection, outlier detection, imbalance learning, classification, and regression, with a final emphasis on explainability.

Project Structure

  • Data Preparation: detailed exploration of various preprocessing methods including feature selection, outlier detection (several families of methods), and imbalance learning techniques (both undersampling and oversampling)
  • Classification and regression: implementation and evaluation of advanced classification/regression algorithms including SVM, RandomForest, XGBoost, Logistic Regression and others.
  • Time Series Analysis: in this part of the project the dataset comprises time series data extracted from audio file. Here we worked on the implementation and evaluation of advanced classification algorithms such as ROCKET (Randomized Convolutional Kernel Transform), K-Nearest Neighbors (KNN), and Shapelets for accurate classification of audio-derived time series data. Moreover, we worked on Clustering Techniques using different distance metrics and Motifs and Discords Discovery.
  • Explainability: the final part of the project focuses on enhancing the interpretability of the models developed throughout the project, aiming to provide insights into the decisions made by the models and their underlying mechanisms.

About

this repository hosts our Data Mining project @ University of Pisa, where we explore advanced preprocessing and classification techniques

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published