comes with Python Version 3.7
and CUDA of your choice.
For a different CUDA combo change line 1 in the Dockerfile with the right base image from here:
ARG CUDA_VERSION=10.0
Demo:
docker run --gpus all --rm ohjho/py_deep_learning_fastapi:CUDA10.0-Py3.7
This docker image was basically a fork from tiangolo's but with support for CUDA.
It was claimed that there are auto-tunning for the gunicorn settings but the logic is just basically 1 worker per CPU core available (with min of 2 workers). Therefore, it is still better to start with 1 worker, do a load test, see how much resources that requires and scale up the number of workers as needed.
My preferred container environment configs for tunning gunicorn are:
TIMEOUT
: set to 0 because it's hard to keep track of long running async tasksGRACEFUL_TIMEOUT
: same as aboveWEB_CONCURRENCY
: this directly control the number of workers
For example, the following will launch your API with 2 workers:
docker run --gpus all -d -e TIMEOUT="0" -e GRACEFUL_TIMEOUT="0" -e WEB_CONCURRENCY="2" --rm --name name_your_container name_of_your_image
This directory is used to generate images on this DockerHub repo