Skip to content

Repository used for basic study of tools used for data analysis and visualization.

Notifications You must be signed in to change notification settings

GuilhermeMonteiroPeixoto/Data-Analysis-and-Visualization

Repository files navigation

Data Analysis and Visualization

This repository aims to show basic concepts of data analysis, and not only the technical content, but also, a critical view regarding the data.

  • Python libraries

    • Numpy

      NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices.

    • SciPy

      SciPy is a scientific computation library that uses NumPy underneath. SciPy stands for Scientific Python. It provides more utility functions for optimization, stats and signal processing.

    • Pandas

      pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

    • StatsModels

      statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.

    • Matplotlib

      Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

    • Seaborn

      Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

    • Datetime

      The datetime module includes functions and classes for doing date and time parsing, formatting, and arithmetic.

    • Threading

      This module constructs higher-level threading interfaces on top of the lower level _thread module.

    • Speedtest

      Command line interface for testing internet bandwidth using speedtest.net

    • Faker

      Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you.

    • Missingno

      Missingno is a Python library that provides the ability to understand the distribution of missing values through informative visualizations.

    • FuzzyWuzzy

      It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

  • Introduction to Machine Learning

    • Scikit-learn
      • Classification
      • Regression
      • Clustering
      • Dimensionality reduction
      • Model selection
      • Preprocessing
    • XGboost

The content related to Data Science (Machine Learning and Deep Learning) is available in another repository.

  • Data Scraping

    • Scrapy
    • Selenium WebDriver
  • SQL

About

Repository used for basic study of tools used for data analysis and visualization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published