GitHub - Sabuny/Data-Cleaning

Data Cleaning Projects

Overview

This repository is a collection of projects focused on cleaning, preprocessing and preparing datasets for further analysis or modeling.Data cleaning is a crucial step in any data workflow and this repository showcases the use of Python libraries to handle various data quality challenges.

Objective

I want this repository to serve as a resource to anyone interested in learning how to clean and preprocess datasets.

Features

Handling missing values.
Detecting and removing duplicates
Standardizing text
Parsing and formatting dates
Feature engineering
Outlier detection
Data type conversions and validations.

Python libraries

Pandas: For data manipulation and analysis

Numpy: For numerical operations

FuzzyWuzzy: For text matching and cleaning

Matplotlib & Seaborn: For visualizing data during the cleaning process.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Customer_Survey		Customer_Survey
HealthCare Survey		HealthCare Survey
Salary_Survey		Salary_Survey
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Cleaning Projects

Overview

Objective

Features

Python libraries

About

Releases

Packages

Languages

Sabuny/Data-Cleaning

Folders and files

Latest commit

History

Repository files navigation

Data Cleaning Projects

Overview

Objective

Features

Python libraries

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages