Classification with Pyspark

Author: Salma OUARDI

In the course of this project, I successfully performed feature extraction from raw input data, and trained multiple classification models using the Mllib library. A comprehensive performance comparison was carried out among the models, in order to determine the most optimal model.

This project is inspired from the book Machine Learning with Spark

Tasks / Achievements

Used PySpark to extract the appropriate features from raw input data.
Trained a number of classification models using MLlib.
Made predictions with our classification models.
Applied a number of standard evaluation techniques to assess the predictive performance of our models.
Explored the impact of parameter tuning on model performance and learn how to use cross-validation to select the most optimal model parameters.

The notebook Classification_with_Pyspark.ipynb has a full description of each step of this project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Classification with Pyspark

Author: Salma OUARDI

Tasks / Achievements

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Classification with Pyspark

Author: Salma OUARDI

Tasks / Achievements