Skip to content
Stephan Hageboeck edited this page Jun 30, 2021 · 14 revisions

Some information on the GPU-capable CI runner for github.

Checking on the container

On itscrd04,

$ sudo su CI
$ podman container list
CONTAINER ID  IMAGE                           COMMAND          CREATED       STATUS            PORTS   NAMES
3bb86d52953f  localhost/github_runner:latest  ./entrypoint.sh  5 months ago  Up 6 minutes ago          githubCI_container

Start a shell in the container to e.g. debug

As user CI (see above):

podman exec -ti githubCI_container /bin/bash
  • -t Terminal
  • -i Interactive

Now navigate to _work where you will find the directories left from the last CI run.

Restart a container

As user CI (see above):

podman restart githubCI_container

How to create a GPU-capable Github Runner on Centos8

Install podman etc

To set up a GPU-capable container that can run github jobs, the nvidia container runtime is needed. On Centos8:

  sudo yum install podman
  curl -s -L https://nvidia.github.io/nvidia-container-runtime/centos8/nvidia-container-runtime.repo > nvidia-container-runtime.repo
  sudo mv nvidia-container-runtime.repo /etc/yum-puppet.repos.d/
  sudo yum install nvidia-container-runtime

A few customisations are needed:

  • Patch /etc/nvidia-container-runtime/config.toml with
  [nvidia-container-cli]
  no-cgroups = true
  • Also, images are by default saved to home (=afs). Fix storage config in ~/.config/containers/storage.conf.
  • Create space to store containers, and fix permissions using, e.g.:
    sudo semanage fcontext -a -e /var/lib/containers /data/containers
    sudo restorecon -R -vv /data/containers
  • Finally, allow starting of containers from systemd for running podman as a service
    sudo setsebool -P container_manage_cgroup on

Test setup

One can use nvidia-smi to test if the GPU is usable inside the container:

  podman pull nvidia/cuda:11.1-devel-centos8
  # Test that container starts up
  podman run --rm --security-opt=label=disable nvidia/cuda:11.1-devel-centos8 nvidia-smi

Set up the container image

Over nvidia's cuda-capable centos8 container, we have to put a layer with a few additions. The github runner doesn't want to run as root, so we create a user in the container called "CI".

cat > containerManifest <<EOF
FROM nvidia/cuda:11.1-devel-centos8
LABEL maintaner="Stephan"

RUN yum install -y cmake which git libicu lttng-ust vim
RUN useradd CI
USER CI
WORKDIR /home/CI/
RUN mkdir actions-runner && cd /tmp/ && curl -O -L https://github.com/actions/runner/releases/download/v2.274.2/actions-runner-linux-x64-2.274.2.tar.gz && cd /home/CI/actions-runner && tar -xzf /tmp/actions-runner-linux-x64-2.274.2.tar.gz
WORKDIR /home/CI/actions-runner
RUN ./config.sh --unattended --url ${repoURL} --token ${githubToken} --replace --name ${runnerName}
COPY ./entrypoint.sh .
RUN chmod u+x ./entrypoint.sh
CMD [ "./entrypoint.sh" ]
EOF
podman build --tag github_runner -f containerManifest

The three variables are

The entrypoint.sh is something along the lines of

#!/bin/bash
RUNNER=/home/CI/actions-runner/run.sh

while true; do
  if ! pgrep -f ${RUNNER} > /dev/null 2>&1; then
    # Runner hasn't been started yet or exited because of failure / update
    ${RUNNER}
  else
    # Runner was restarted, and is running in background. Let's find its PID and wait until it exits:
    PID=$(pgrep -f ${RUNNER}) && tail --pid=$PID -f /dev/null
  fi
  sleep 10
done

This is needed since the runner process actions-runner/run.sh exits when the github runner auto updates. This would stop the container.

Running the container

After creating the container image, one can run it manually using e.g.

# Run container:
# label=disable disables carrying over of SELinux labels for mounts inside the container
podman create --security-opt=label=disable --name githubCI_container github_runner
podman start -d githubCI_container

Or, to install it as a service:

  • Generate systemd unit file:
podman generate systemd --restart-policy=always -t 10 -n githubCI_container
  • Customise and install in e.g. /etc/systemd/system/github-ci.service:
[Unit]
Description=Podman container-githubCI_container.service
Documentation=man:podman-generate-systemd(1)
Wants=network.target
After=network-online.target

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=always
ExecStart=/usr/bin/podman start githubCI_container
ExecStop=/usr/bin/podman stop -t 10 githubCI_container
ExecStopPost=/usr/bin/podman stop -t 10 githubCI_container
RuntimeDirectory=github-ci.service
KillMode=none
Type=forking
User=CI
Group=CI
Nice=5

[Install]
WantedBy=multi-user.target default.target
  • Start as
sudo systemctl daemon-reload
sudo systemctl start github-ci.service
  • Install for start with OS using
sudo systemctl enable github-ci.service