Skip to content

Latest commit

 

History

History
31 lines (21 loc) · 1.21 KB

readme.md

File metadata and controls

31 lines (21 loc) · 1.21 KB


Front-End Checklist

Classification with Pyspark

Author: Salma OUARDI


In the course of this project, I successfully performed feature extraction from raw input data, and trained multiple classification models using the Mllib library. A comprehensive performance comparison was carried out among the models, in order to determine the most optimal model.

This project is inspired from the book Machine Learning with Spark


Tasks / Achievements

  • Used PySpark to extract the appropriate features from raw input data.
  • Trained a number of classification models using MLlib.
  • Made predictions with our classification models.
  • Applied a number of standard evaluation techniques to assess the predictive performance of our models.
  • Explored the impact of parameter tuning on model performance and learn how to use cross-validation to select the most optimal model parameters.

The notebook Classification_with_Pyspark.ipynb has a full description of each step of this project.