Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ParMETIS to local Dockerfile #1102

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions docker/build_docker_oss4local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,13 @@ else
DEVICE_TYPE="$4"
fi

# process argument 5: support for parmetis
if [ -z "$4" ]; then
USE_PARMETIS="false"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We better to note the option here in the doc. We have changed the way to build docker image here.

else
USE_PARMETIS="$5"
fi

# Copy scripts and tools codes to the docker folder
mkdir -p $GSF_HOME"/docker/code"
cp $SCRIPT_DIR"/local/fetch_and_run.sh" $GSF_HOME"/docker/code/"
Expand All @@ -42,7 +49,6 @@ cp -r $GSF_HOME"/inference_scripts" $GSF_HOME"/docker/code/inference_scripts"
cp -r $GSF_HOME"/tools" $GSF_HOME"/docker/code/tools"
cp -r $GSF_HOME"/training_scripts" $GSF_HOME"/docker/code/training_scripts"


# Build OSS docker for EC2 instances that an pull ECR docker images
DOCKER_FULLNAME="${IMAGE_NAME}:${TAG}-${DEVICE_TYPE}"

Expand All @@ -55,7 +61,7 @@ elif [[ $DEVICE_TYPE = "cpu" ]]; then
docker login --username AWS --password-stdin public.ecr.aws
SOURCE_IMAGE="public.ecr.aws/ubuntu/ubuntu:22.04_stable"
else
echo >&2 -e "Image type can only be \"gpu\" or \"cpu\", but got \""$DEVICE_TYPE"\""
echo >&2 -e "Image type can only be \"gpu\" or \"cpu\", but got '$DEVICE_TYPE'"
# remove the temporary code folder
rm -rf code
exit 1
Expand All @@ -65,6 +71,7 @@ fi
DOCKER_BUILDKIT=1 docker build \
--build-arg DEVICE=$DEVICE_TYPE \
--build-arg SOURCE=${SOURCE_IMAGE} \
--build-arg PARMETIS=${USE_PARMETIS} \
-f "${GSF_HOME}/docker/local/Dockerfile.local" . -t $DOCKER_FULLNAME

# remove the temporary code folder
Expand Down
64 changes: 56 additions & 8 deletions docker/local/Dockerfile.local
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
ARG DEVICE=gpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you got any number about time to build the docker image here? I think a big burden here is that the time for building parmetis docker image will be too long, and also make the docker image itself too large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParMETIS dependencies are or added if requested by the user, the build time should be roughly the same as building the ParMETIS image using the specialized Dockerfile.

Could you clarify if there's a concern I'm missing here? I will add some measurements of build times for the images using the specialized Dockerfile vs. this one.

ARG USE_PARMETIS=false
ARG SOURCE

FROM ${SOURCE} as base
Expand Down Expand Up @@ -48,6 +49,11 @@ ARG OGB_VERSION=1.3.6
ARG TORCH_VERSION=2.3
ARG TRANSFORMERS_VERSION=4.28.1

# Download dgl files
RUN cd /root; git clone --branch v${DGL_VERSION} --single-branch https://github.com/dmlc/dgl.git
ENV DGL_HOME=/root/dgl
ENV DGLBACKEND=pytorch

FROM base as base-cpu

# Install torch, DGL, and GSF deps that require torch
Expand Down Expand Up @@ -75,18 +81,50 @@ RUN TORCH_MAJOR_MINOR=$(echo $TORCH_VERSION | cut -c1-3) && \
transformers==${TRANSFORMERS_VERSION} \
&& rm -rf /root/.cache

FROM base-${DEVICE} as runtime
FROM base-${DEVICE} as parmetis-true

ENV PYTHONPATH="/root/dgl/tools/:${PYTHONPATH}"
# Install MPI and dependencies
RUN apt update && apt install -y --no-install-recommends \
build-essential \
cmake \
libopenmpi-dev \
openmpi-bin \
&& rm -rf /var/lib/apt/lists/*

RUN pip install \
pyyaml \
&& rm -rf /root/.cache

# Download DGL source code
RUN cd /root; git clone --branch v${DGL_VERSION} https://github.com/dmlc/dgl.git
# Install GKLib
RUN cd /root && \
git clone --single-branch --branch master https://github.com/KarypisLab/GKlib && \
cd GKlib && \
make && \
make install

# Copy GraphStorm source and add to PYTHONPATH
RUN mkdir -p /graphstorm
COPY code/python/graphstorm /graphstorm/python/graphstorm
ENV PYTHONPATH="/graphstorm/python/:${PYTHONPATH}"
# Install Metis
RUN cd /root && \
git clone --single-branch --branch master https://github.com/KarypisLab/METIS.git && \
cd METIS && \
make config shared=1 cc=gcc prefix=/root/local i64=1 && \
make install

# Install Parmetis
RUN cd /root && \
git clone --single-branch --branch main https://github.com/KarypisLab/PM4GNN.git && \
cd PM4GNN && \
make config cc=mpicc prefix=/root/local && \
make install

ENV PATH=$PATH:/root/local/bin
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/local/lib/
RUN cp /root/local/bin/pm_dglpart /root/local/bin/pm_dglpart3
thvasilo marked this conversation as resolved.
Show resolved Hide resolved

FROM base-${DEVICE} as parmetis-false

# No additional dependencies when not supporting ParMETIS

FROM parmetis-${USE_PARMETIS} as runtime

# Set up SSH access
ENV SSH_PORT=2222
Expand All @@ -101,11 +139,21 @@ RUN mkdir -p ${SSHDIR} \

EXPOSE ${SSH_PORT}

ENV PYTHONPATH="/root/dgl/tools/:${PYTHONPATH}"
thvasilo marked this conversation as resolved.
Show resolved Hide resolved


# Copy GraphStorm source and add to PYTHONPATH
RUN mkdir -p /graphstorm
COPY code/python/graphstorm /graphstorm/python/graphstorm
ENV PYTHONPATH="/graphstorm/python/:${PYTHONPATH}"


# Copy GraphStorm scripts and tools
COPY code/examples /graphstorm/examples
COPY code/inference_scripts /graphstorm/inference_scripts
COPY code/tools /graphstorm/tools
COPY code/training_scripts /graphstorm/training_scripts
COPY code/fetch_and_run.sh /graphstorm/fetch_and_run.sh
RUN chmod +x "/graphstorm/fetch_and_run.sh"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we need to do this now?


CMD ["/usr/sbin/sshd", "-D"]