Working with Dockers and containers has quickly become a handy skill for the best software, data and backend engineers to take control of deploying their own services in the cloud. 💪
In this whirlwind introduction we will dive in to basics of this technology and then get hands-on with a toy use-case to see it in action! 🚀
This walkthrough is a supporting document for the above mentioned talk as part of the TechLabs Berlin Summer 23 Workshop Weekend.
Kindly complete these steps prior to attending the workshop! ❤️️
As the title suggests, Docker is our tool of choice for this workshop. It is an open platform that enables development, shipping and productionization of application code in a containerized manner. More about Docker will be covered in our workshop.
Requirement: To prepare for the workshop, we request that you download and install the Docker Desktop application. Please follow through all of the download and installation instructions for your operating system. Get Docker!
For advanced users, feel free to use your tool of choice to install Docker, if you do not already have it on your machine. We currently recommend the interactive installer as the simplest way to get Docker installed on Apple silicon. Additionally, if you are on Apple Silicon you may also have to download and install VirtualBox.
In our demo we will be target deploying a simple app with a database. For this purpose we will need to select a database and install the required binaries to work with that database on your local machine.
For this tutorial we will be working with PostGresSQL
Please make sure you download and install psql
for your operating system. (For mac users we would recommend the "installing with HomeBrew" option.)
Heroku is a platform as a service (PaaS) that enables developers to build, run, and operate applications entirely in the cloud. It is one of the platforms of choice to deploy small, simple apps on a quick tier. We will be using it as part of this workshop to deploy our application to the cloud and make it available to anyone over the internet.
This section is optional since it is mostly to demo deployment and not really a part of the tutorial.
Documentation for correct installation of this tool can be found on the official Heroku Dev Center. Please follow all the instructions here to get the tool setup for your operating system.
After you install the CLI, run the heroku login
command.
heroku login
This will alow you to authenticate your account via a browser window so that the CLI can store your credentials. You are now setup to work with the Heroku CLI.
The following is a walkthrough of all topics, exercise and lessons as part of this introductory tutorial to Dockers and the container paradigm. We also set up a basic introduction to this topic in our intro slides.
We will be taking a toy example as a use case. As a CS student I often had the problem that outside of outdated tools such as MS Access or OracleDB, we rarely had good tools to practice SQL. Hence to address this need, today 15 years later, I will strive to build a browser based tool in the likes of SQLBolt which allows students to play, test and learn SQL. This will be the end deliverable in today's tutorial.
Let's get started, to begin with the setup, first make sure you have cloned our tutorial repo. For cloning the tutorial repo, use the command:
$ git clone https://github.com/TechLabs-Berlin/mlds-tools.git
Now let's quickly switch to our work directory for this project:
$ cd mlds-tools/workshops/ww_summer_23
Since we actually plan to make some code changes, let's first switch to a new branch:
$ git checkout -b workshop
To ensure that we are all set to get started with Docker, let's do one final test. Now that you are in the right working directory, fire up the following command:
$ docker run hello-world
If Docker was setup correctly, you should see an output like this:
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.(arm64v8)
3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
If you received any errors, there might be an issue with your installation or Docker setup. We would recommend following along with the workshop and perhaps reaching out to resolve this issue before attempting the tutorial yourself.
Now let's actually get started with our very first Dockerfile.
A document that specifies the target that you want to build. Interacting with Docker will mostly require you to learn directives in order to define and manipulate your Docker containers. Let's get started with a very simple example.
In our current work directory, let's create a file and simply name it Dockerfile
. Now add the following contents to the file and save it:
FROM postgres:alpine
ENV POSTGRES_PASSWORD docker
That's it! Before we take a moment to understand what's really happening in this dockerfile. Let's first see it in action!
To build an image from the Dockerfile you just defined, use the following command:
$ docker build -t postgresdb ./
Once the image is built, use the run command to actually run your a fresh docker container using the image you just built.
$ docker run --name demodb -p 5432:5432 postgresdb
Now we can start to understand the effect of these commands:
- We built a new docker image from a dockerfile and tagged it for later use
- We ran the command to direct the Docker engine to run the image specified in a new container
- We saw the results: running the container, will start a DB that is awaiting connections at a specific port.
As earlier mentioned, we will be using a CLI tool to connect to our local running database instance. First restart the database instance, this time with an extra option -d
(to allow it run in a detached mode).
$ docker run -d --name demodb -p 5432:5432 postgresdb
Assuming that you have postrgesql
and the necessary setup for the same you should be able to connect to our DB with psql
using the following syntax:
$ psql -d postgres -U postgres -h localhost
Entering the password docker at prompt will result in a successful connection!
Some basic commands to interact with the DB:
\d
will list all tables\q
will quit interactive CLI mode
To run code for the following exercise, first add the following lines to the bottom of your Dockerfile
(right after the two existing directives in the file):
# init db tables that we desire
COPY data/*.csv tmp
COPY *.sql /docker-entrypoint-initdb.d/
Now simply, build and run the docker image once again!
Build:
$ docker build -t postgresdb ./
Run:
$ docker run -d --name demodb -p 5432:5432 postgresdb
Note: notice we added an extra option to our command, -d
At this point, you should receive an error when you try to run. It will complain about the container already existing. This is from our previous run. We will want to remove this container.
To kill al running containers:
$ docker kill $(docker ps -q)
Remove all idle containers:
$ docker rm $(docker ps -a -q)
After makes these fixes, simply attempt to run the container again:
$ docker run -d --name demodb -p 5432:5432 postgresdb
You can once again connect to our database, now you will finally start to see some tables:
$ psql -d postgres -U postgres -h localhost
Now in this short section of exercises:
- We've learnt how to build images, run and connect to docker containers
- We've understood the base
postgresdb
image and how to add data to a running DB - Connect and mange data with CLI tools!
You can find a cheatsheet reference like the one this, to collect all useful commands in one place!
In this section we will focus on illustrating how armed with the limited knowledge of a few commands and docker directives, you can already start to develop some production grade applications. Let's get started!
Popular pythonic service to setup an easy web server. In this section of this tutorial, we will focus on how Docker can be used as re-producible environment for local development and testing. We will build a browser-based interface to query our previously defined database.
Simply spin up the predefined Dockerfile we have provided for this purpose:
$ docker build -f dockerfiles/app.Dockerfile -t webapp . --no-cache
And now run the container:
$ docker run --name sqlrunner -p 5050:5050 webapp
If you tried running your flask application, you noticed that despite us launching the container and it running successfully we are still encountering an error when we try to navigate to our website. You should see a similar error:
UnboundLocalError: cannot access variable 'connection' where it is not associated with a value
This is simply because our Flask app is running in a container and the DB is in it's own container. Currently they are not linked. To do this run:
$ docker network create OSRNetwork
Then simply re-run our containers specifying the network it now belongs to. First, with the database container
$ docker run -d --name demodb -p 5432:5432 --network OSRNetwork postgresdb
And then with the Flask container:
$ docker run --name sqlrunner -p 5050:5050 --network OSRNetwork webapp
This time, everything should work as per plan.
- In this section, we learnt to develop a Flask application within Docker
- We ran in to our first network configuration issue and solved it by defining a route between the two containers
Here we demonstrate how docker can actually be used as a sandbox for testing various changes in a secure and robust manner. To try it out ourselves, we have added some code snippets which will let you extend and create new functionality in our app.
Add the following line to the code in templates/base.html
, right after the line with the <input>
tag, in line 16.
<button type="submit" name="listTables">List Tables</button>
Similarly in the file app/run.py
add the following snippet in line 21.
if not run_text:
run_text = "SELECT tablename FROM pg_catalog.pg_tables WHERE schemaname='public'"
Now you can simply run the build and run commands again to deploy your flask app with the additional features.
Simply repeat the steps to add the upload table feature as well Add the following line to the code in templates/base.html
, right after the line added in previous step, in line 17.
<button type="submit" name="upload" formaction="/upload" formmethod="GET">Upload Table</button>
Similarly in the file app/run.py
add a new method after line 50.
@app.route('/upload', methods=['GET', 'POST'])
def upload():
if request.method == 'POST':
# check if the post request has the file part
if 'file' not in request.files:
flash('No file part')
return redirect(request.url)
file = request.files['file']
# If the user does not select a file, the browser submits an
# empty file without a filename.
if file.filename == '':
flash('No selected file')
return redirect(request.url)
file_table, file_ext = file.filename.split(".")
if file_ext.lower() in ALLOWED_EXTENSIONS:
filename = secure_filename(file.filename)
file.save(filename)
df = pd.read_csv(filename, sep=",")
print(df)
engine = create_engine('postgresql+psycopg2://postgres:docker@demodb:5432/postgres')
df.to_sql(file_table, con=engine, if_exists='fail', index=False)
return redirect('/')
return '''
<!doctype html>
<title>Upload Table</title>
<h1>Upload new CSV File</h1>
<form method="POST" enctype=multipart/form-data>
<input type=file name="file"><br/><br/>
<input type=submit value="Upload">
</form>
'''
Re-build:
$ docker build -f dockerfiles/app.Dockerfile -t webapp . --no-cache
Re-run:
$ docker run --name sqlrunner -p 5050:5050 --network OSRNetwork webapp
- We've now learnt how to leverage Docker as a sandbox environment to test changes
We've mostly seen how to develop for Docker in isolation, so far. In this final section we are going to demonstrate how Docker enable steam to share and work cross-collaboration with other developers. In this section, we will focus on switching out existing containers with new containers without trying to understand the underlying implementation.
We've developed a slightly polished version of the SQL Runner app used in our workshop. Here, the code has been shared with you in the form of another docker image. The image Dockerfile has been provided in app/streamlit.Dockerfile
.
To build:
$ docker build -f dockerfiles/streamlit.Dockerfile -t streamlit . --no-cache
To run:
$ docker run --name graphicosr -p 8501:8501 --network OSRNetwork streamlit
Take some time to check out all the features we've added in the form of our newly dockerized streamlit app.
We'll repeat this exercise, but this time we want to plug in a new back-end. Assume that this has been deployed with a cloud-based data service such as fly.io or CockRoach DB. In our example we went with Cockroach as it is a great free data backend hosting service. It reduces cost by using a serverless database, but allowing users to query it using PostgresSQL. Magic!
To see it in action, we will have to re-build with a few changes. We've fetched connection details from our deployed database cluster on Cockroach Cloud. We will commit this string as an environmental variable with the name COCKROACH_URL
. We've added this to our .zshrc
file. If you have never done this before, this is an easy guide to add permanent environmental variables.
With this variable now defined, we can proceed with using it in our app/main.py
as our default connection. Find and replace all lines with a create_engine()
invocation with the following line of code:
engine = create_engine(os.environ["COCKROACH_URL"])
The entire line can simply be substituted.
Additionally, there are a few commented lines in our dockerfiles/streamlit.Dockerfile
. Simply uncomment lines 26 - 28, this will allow a certificate for connecting to Cockroach Cloud to be generated locally in your target Docker image.
Repeat build and run steps to watch your changes in action.
Build:
$ docker build -f dockerfiles/streamlit.Dockerfile -t streamlit . --no-cache
Note: For sake of security we are not displaying the full correct connection string here. Never commit connection details like passwords to your repository. Replace all parameters passed in environmental variable with correct values.
Run:
$ docker run --name graphicosr -p 8501:8501 -e COCKROACH_URL="cockroachdb://postgres:{password}@{host}/defaultdb?sslmode=verify-full" streamlit
We have also prepared a notebook to help setup and store our data in to your Cockroach DB cluster.
That brings us to the end our tutorial section.
- In our final section we demonstrated how to use inter-changeable configurations to truly enable the plug-and-playability of Docker for limitless applications and collaboration.
For the last part of our workshop we will talk about some production usecases, pros and cons of Docker and some best practices that you can already start to apply.
You can find a summary of the content in our outro slides.
Shout out to docker for beginners by Prakhar Srivastav which is a great one stop tutorial for all things Docker and certainly takes the intro-level skills illustrated here to the next level!
The Docker official site also hosts some great Hands-on Docker Tutorials for Developers.
Lastly, if you prefer more interactive learning, we would recommend checking out Play with Docker. A project built by some external developers, that impressed Docker so much they decided to sponsor it, Play with Docker (PWD) aims to offer a simple, interactive and fun playground to learn Docker.
When you finally start to get comfortable with Docker maybe it's time to graduate on to Docker Compose 🚀🤘