This repository aims to show basic concepts of data analysis, and not only the technical content, but also, a critical view regarding the data.
-
Python libraries
- Numpy
NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices.
- SciPy
SciPy is a scientific computation library that uses NumPy underneath. SciPy stands for Scientific Python. It provides more utility functions for optimization, stats and signal processing.
- Pandas
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
- StatsModels
statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
- Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
- Seaborn
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
- Datetime
The datetime module includes functions and classes for doing date and time parsing, formatting, and arithmetic.
- Threading
This module constructs higher-level threading interfaces on top of the lower level _thread module.
- Speedtest
Command line interface for testing internet bandwidth using speedtest.net
- Faker
Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you.
- Missingno
Missingno is a Python library that provides the ability to understand the distribution of missing values through informative visualizations.
- FuzzyWuzzy
It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.
- Numpy
-
Introduction to Machine Learning
- Scikit-learn
- Classification
- Regression
- Clustering
- Dimensionality reduction
- Model selection
- Preprocessing
- XGboost
- Scikit-learn
The content related to Data Science (Machine Learning and Deep Learning) is available in another repository.
-
Data Scraping
- Scrapy
- Selenium WebDriver
-
SQL