Skip to content

Latest commit

 

History

History
61 lines (41 loc) · 2.95 KB

README.md

File metadata and controls

61 lines (41 loc) · 2.95 KB
hide
navigation
toc

DevOps for data scientists

Webpage: https://skaftenicki.github.io/ku_devops/.

This repository contains a small introduction to developer operations (DevOps) for students in the course Grundlæggende Data Science (GDS) at Copenhagen University. The four core topics covered are:

You are supposed to do them in the order listed. When doing the exercises, to maximize your DevOps experience you should prioritize:

  1. Make yourself familiar with running commands in the terminal. The terminal can be a scary place, but it is an essential skill to be able to run commands without relying on a graphical interface. If you want a good introduction to using the shell, I highly recommend the first two lectures from this MIT course.

  2. Only use scripts e.g. no notebooks for these exercises. Notebooks have their benefits but the fact is that developing software in the real world is done in scripts. Therefore make sure that whenever you are writing code for the exercises do this in .py scripts. If you feel like you miss the interactiveness of notebooks when working with the script I can highly recommend giving ipython a spin.

  3. Get a good code editor, and try using it. If you do not have one, I can highly recommend Visual Studio Code that are a lightweight editor, but through extensions can become powerful. Otherwise, I also recommend PyCharm.

Why should a data scientist care about DevOps? Because DevOps provides processes and tools for creating reproducible experiments at scale when working with any kind of computer science/data science. Being able to ensure that your experiments are reproducible is important in the context of the scientific method:


Image credit

Without reproducibility, the method breaks at the experimental stage, as non-reproducible experiments will most likely lead to different results and thereby different conclusions on the initial hypothesis.

For a much more complete set of material on this topic, see this course which goes over the nearly complete pipeline of designing, modeling and deploying machine learning applications.