-
Notifications
You must be signed in to change notification settings - Fork 33
Some information on the GPU-capable CI runner for github.
On itscrd04
,
$ sudo su CI
$ podman container list
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3bb86d52953f localhost/github_runner:latest ./entrypoint.sh 5 months ago Up 6 minutes ago githubCI_container
As user CI
(see above):
podman exec -ti githubCI_container /bin/bash
-
-t
Terminal -
-i
Interactive
Now navigate to _work
where you will find the directories left from the last CI run.
As user CI
(see above):
podman restart githubCI_container
To set up a GPU-capable container that can run github jobs, the nvidia container runtime is needed. On Centos8:
sudo yum install podman
curl -s -L https://nvidia.github.io/nvidia-container-runtime/centos8/nvidia-container-runtime.repo > nvidia-container-runtime.repo
sudo mv nvidia-container-runtime.repo /etc/yum-puppet.repos.d/
sudo yum install nvidia-container-runtime
A few customisations are needed:
- Patch /etc/nvidia-container-runtime/config.toml with
[nvidia-container-cli]
no-cgroups = true
- Also, images are by default saved to home (=afs). Fix storage config in
~/.config/containers/storage.conf
. - Create space to store containers, and fix permissions using, e.g.:
sudo semanage fcontext -a -e /var/lib/containers /data/containers
sudo restorecon -R -vv /data/containers
- Finally, allow starting of containers from systemd for running podman as a service
sudo setsebool -P container_manage_cgroup on
One can use nvidia-smi
to test if the GPU is usable inside the container:
podman pull nvidia/cuda:11.1-devel-centos8
# Test that container starts up
podman run --rm --security-opt=label=disable nvidia/cuda:11.1-devel-centos8 nvidia-smi
Over nvidia's cuda-capable centos8 container, we have to put a layer with a few additions.
The github runner doesn't want to run as root
, so we create a user in the container called "CI".
cat > containerManifest <<EOF
FROM nvidia/cuda:11.1-devel-centos8
LABEL maintaner="Stephan"
RUN yum install -y cmake which git libicu lttng-ust vim
RUN useradd CI
USER CI
WORKDIR /home/CI/
RUN mkdir actions-runner && cd /tmp/ && curl -O -L https://github.com/actions/runner/releases/download/v2.274.2/actions-runner-linux-x64-2.274.2.tar.gz && cd /home/CI/actions-runner && tar -xzf /tmp/actions-runner-linux-x64-2.274.2.tar.gz
WORKDIR /home/CI/actions-runner
RUN ./config.sh --unattended --url ${repoURL} --token ${githubToken} --replace --name ${runnerName}
COPY ./entrypoint.sh .
RUN chmod u+x ./entrypoint.sh
CMD [ "./entrypoint.sh" ]
EOF
podman build --tag github_runner -f containerManifest
The three variables are
-
${repoURL}
, e.g.https://github.com/madgraph5/madgraph4gpu
-
${githubToken}
Token that github spits out when going here: https://github.com/madgraph5/madgraph4gpu/settings/actions -
${runnerName}
Name for the runner
The entrypoint.sh
is something along the lines of
#!/bin/bash
RUNNER=/home/CI/actions-runner/run.sh
while true; do
if ! pgrep -f ${RUNNER} > /dev/null 2>&1; then
# Runner hasn't been started yet or exited because of failure / update
${RUNNER}
else
# Runner was restarted, and is running in background. Let's find its PID and wait until it exits:
PID=$(pgrep -f ${RUNNER}) && tail --pid=$PID -f /dev/null
fi
sleep 10
done
This is needed since the runner process actions-runner/run.sh
exits when the github runner auto updates.
This would stop the container.
After creating the container image, one can run it manually using e.g.
# Run container:
# label=disable disables carrying over of SELinux labels for mounts inside the container
podman create --security-opt=label=disable --name githubCI_container github_runner
podman start -d githubCI_container
- Generate systemd unit file:
podman generate systemd --restart-policy=always -t 10 -n githubCI_container
- Customise and install in e.g.
/etc/systemd/system/github-ci.service
:
[Unit]
Description=Podman container-githubCI_container.service
Documentation=man:podman-generate-systemd(1)
Wants=network.target
After=network-online.target
[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=always
ExecStart=/usr/bin/podman start githubCI_container
ExecStop=/usr/bin/podman stop -t 10 githubCI_container
ExecStopPost=/usr/bin/podman stop -t 10 githubCI_container
RuntimeDirectory=github-ci.service
KillMode=none
Type=forking
User=CI
Group=CI
Nice=5
[Install]
WantedBy=multi-user.target default.target
- Start as
sudo systemctl daemon-reload
sudo systemctl start github-ci.service
- Install for start with OS using
sudo systemctl enable github-ci.service