Skip to content

Latest commit

 

History

History
22 lines (12 loc) · 2.88 KB

doing-data-science.md

File metadata and controls

22 lines (12 loc) · 2.88 KB

@sevo_jakub

TL;DR A must-read for every CS student and everyone who wants to make something out of his data.

Doing data science: Straight talk from the frontline is a book based on a course Introduction to Data Science thought at Columbia University in 2012 which is reflected in its structure and content. The book is organized into several topics/lectures, describing various problems when extracting information form data in connection with used methods for various types of data.

They describe Data science profile as a mix of several skills such as computer science, math/statistics and machine learning but also domain expertise and data visualization.

The book is intended for those who want to gain overview in the field but also for those who want to deepen their understanding of reasons why and how various methods are working and what are their limitations. The book is organised into several chapters, each for different problem or different type of data such as: time series analysis, spam detection, recommendation, data visualization or social network analysis. Every chapter contains a case study with concrete applications of described methods written by an expert who practice described methods in real project.

Every topics is written as separate section of the book and you don't have to follow the structure of the book. You can rather choose the topic most relevant for you at the time and study them separately.

The book tries to create general and the broadest possible overview of methods commonly used in the field, but at the same time it tries to closely explain selected methods and provide concrete application and code examples in Python or R language. The author is aware that every topic they describe is a topic for separate book and they include many links for further resources such as articles and books but also online courses.

Described topics are written for those who want to practice, not for academics even though it is sometimes impossible to completely avoid formulas. For full understanding of the book, the basic knowledge of programming and at least some basics of algebra and statistics. But also without them, one could expect valuable info about methods used in the field and can be used to gain insight in various problems one could face.

Summary: No need to be discouraged by repeated use of the buzzword Big data. If you can overlook it, you will have a book that provide comprehensive overview of the data science for novices in the field and valuable insights about various commonly used methods for those with some level of expertise in the field.