Skip to content

Sabuny/Data-Cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Cleaning Projects

Overview

This repository is a collection of projects focused on cleaning, preprocessing and preparing datasets for further analysis or modeling.Data cleaning is a crucial step in any data workflow and this repository showcases the use of Python libraries to handle various data quality challenges.

Objective

I want this repository to serve as a resource to anyone interested in learning how to clean and preprocess datasets.

Features

  • Handling missing values.

  • Detecting and removing duplicates

  • Standardizing text

  • Parsing and formatting dates

  • Feature engineering

  • Outlier detection

  • Data type conversions and validations.

Python libraries

Pandas: For data manipulation and analysis

Numpy: For numerical operations

FuzzyWuzzy: For text matching and cleaning

Matplotlib & Seaborn: For visualizing data during the cleaning process.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published