Skip to content

Latest commit

 

History

History
27 lines (18 loc) · 906 Bytes

README.md

File metadata and controls

27 lines (18 loc) · 906 Bytes

Data Cleaning Projects

Overview

This repository is a collection of projects focused on cleaning, preprocessing and preparing datasets for further analysis or modeling.Data cleaning is a crucial step in any data workflow and this repository showcases the use of Python libraries to handle various data quality challenges.

Objective

I want this repository to serve as a resource to anyone interested in learning how to clean and preprocess datasets.

Features

  • Handling missing values.

  • Detecting and removing duplicates

  • Standardizing text

  • Parsing and formatting dates

  • Feature engineering

  • Outlier detection

  • Data type conversions and validations.

Python libraries

Pandas: For data manipulation and analysis

Numpy: For numerical operations

FuzzyWuzzy: For text matching and cleaning

Matplotlib & Seaborn: For visualizing data during the cleaning process.