Data Cleaning Projects

Overview

This repository is a collection of projects focused on cleaning, preprocessing and preparing datasets for further analysis or modeling.Data cleaning is a crucial step in any data workflow and this repository showcases the use of Python libraries to handle various data quality challenges.

Objective

I want this repository to serve as a resource to anyone interested in learning how to clean and preprocess datasets.

Features

Handling missing values.
Detecting and removing duplicates
Standardizing text
Parsing and formatting dates
Feature engineering
Outlier detection
Data type conversions and validations.

Python libraries

Pandas: For data manipulation and analysis

Numpy: For numerical operations

FuzzyWuzzy: For text matching and cleaning

Matplotlib & Seaborn: For visualizing data during the cleaning process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Cleaning Projects

Overview

Objective

Features

Python libraries

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Cleaning Projects

Overview

Objective

Features

Python libraries