Skip to content

Commit

Permalink
Add additional gpu tests (New) (#1359)
Browse files Browse the repository at this point in the history
* Clone nvidia/cuda-samples repo

* Add arm64 support to gpu-setup

* Fix GPG missing key for cuda repo.

Some repositories (namely 24.04) do not have the cuda-archive-keyring.gpg file. All relevant repositories have a .pub file, however.

* Add some cuda-samples tests.

Added matrixMulDrv, vectorAddDrv, deviceQueryDrv, simpleTextureDrv

* Use uname -m instead of uname -i

* Separate stress test from normal gpgpu tests

* Rename gpgpu test plans

* Fix new gpgpu names in gpgpu-only.pxu

* Fix typos in gpgpu test-plan.pxu

* Integrate gpu-setup into manage.py build call.

* Verify cuda GPG key being imported.

This hardcodes the current gpg key and checks its fingerprint.

* Add checks for architectures.

NOTE: It seems like x86_64 is the only architecture supported
everywhere. Nvidia seems to support arm64 in *some* cases, but not a
lot. Should we only support x86_64, then?

* Build executables into `bin/` and `data/`

This commit changes the gpu-setup script behaviour to build the cuda-samples
and gpu-burn projects inside the `build/bin` directory, then copy them
out into the `bin/` and (the necessary data files) into `data/`. For the
cuda-samples executables to work, they need access to the data files,
but they do not take the path to the data dir as an argument; to
circumvent this limitation, I have made wrapper scripts that copy the
necessary file into the temporary working directory that checkbox
creates.

Because of the change in build behaviour, the `gpu-setup` script now runs
mostly as a regular user (to avoid permission issues when cleaning
directories/builds). The expected operation now is to run `./manage.py build`
instead of running the `gpu-setup.sh` script itself. This is more inline
with what is done with the other providers.

* Double quote to prevent globbing and word splitting

* Gracefully exit on unsupported architectures.

For now, we are limiting the gpgpu tests to x86_64 since nvidia only
supports x86_64 consistently across distributions/releases.

* Add snap build dependencies for gpgpu provider

This allows the packaging to complete. The gpgpu provider still fails
due to some issues with setting up the repository, but it does not
prevent the packaging to complete. We may need to look
into vendorizing some of the dependencies...

* Remove use of relative paths in `gpu-setup`.

These paths are now resolved to absolute paths to the gpgpu provider's
subdirectories. I also made sure to clean up left-over data files in the
wrapper scripts.

* Revert "Add snap build dependencies for gpgpu provider"

This reverts commit a7def84.

We will revisit properly packaging the gpgpu provider at a later time.
  • Loading branch information
pedro-avalos authored Aug 1, 2024
1 parent 7ab7c0f commit b3df727
Show file tree
Hide file tree
Showing 8 changed files with 151 additions and 25 deletions.
4 changes: 4 additions & 0 deletions providers/gpgpu/bin/gpu_burn.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/usr/bin/env bash
CUDA_PATH=$(find /usr/local -maxdepth 1 -type d -iname "cuda*")/bin
export PATH=$PATH:$CUDA_PATH
gpu_burn -c "$PLAINBOX_PROVIDER_DATA/compare.ptx" 14400 | grep -v -e '^[[:space:]]*$' -e "errors:" -e "Summary at"
4 changes: 4 additions & 0 deletions providers/gpgpu/bin/matrixMulDrv.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/usr/bin/env sh
cp -r "$PLAINBOX_PROVIDER_DATA/matrixMulDrv" ./data
matrixMulDrv
rm -r ./data
4 changes: 4 additions & 0 deletions providers/gpgpu/bin/simpleTextureDrv.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/usr/bin/env sh
cp -r "$PLAINBOX_PROVIDER_DATA/simpleTextureDrv" ./data
simpleTextureDrv
rm -r ./data
4 changes: 4 additions & 0 deletions providers/gpgpu/bin/vectorAddDrv.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/usr/bin/env sh
cp -r "$PLAINBOX_PROVIDER_DATA/vectorAddDrv" ./data
vectorAddDrv
rm -r ./data
97 changes: 77 additions & 20 deletions providers/gpgpu/tools/gpu-setup
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
#!/bin/bash

if [[ $EUID -ne 0 ]]; then
echo "ERROR: This script must be run as root"
exit 1
fi

echo "Configuring system for GPU Testing"
echo "**********************************"
echo "*"
Expand Down Expand Up @@ -43,35 +38,97 @@ echo "* Adding nVidia package repository"
# SAUCE: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=debnetwork

OSRELEASE=ubuntu`lsb_release -r | cut -f2 |sed -e 's/\.//'`
REPO_URL="https://developer.download.nvidia.com/compute/cuda/repos/$OSRELEASE/x86_64"
PINFILE="cuda-$OSRELEASE.pin"
if [[ "$OSRELEASE" =~ "2004" ]] || [[ "$OSRELEASE" =~ "1804" ]]; then
KEYFILE="cuda-$OSRELEASE-keyring.gpg"
else
KEYFILE="cuda-archive-keyring.gpg"
ARCH=`uname -m`
case $ARCH in
arm)
;&
aarch64_be)
;&
aarch64)
ARCH="arm64"
;;
x86_64)
;;
*)
echo "ERROR: Unsupported architecture $ARCH"
exit 0
;;
esac
REPO_URL="https://developer.download.nvidia.com/compute/cuda/repos/$OSRELEASE/$ARCH"

# Import and verify cuda gpg key
wget -O cuda-archive-keyring.gpg "$REPO_URL/3bf863cc.pub"
if [[ $? -eq 8 ]] ; then
echo "ERROR: wget failed. Check networking or $OSRELEASE/$ARCH not supported?"
exit 0
fi
gpg --no-default-keyring --keyring ./temp-keyring.gpg \
--import cuda-archive-keyring.gpg
gpg --no-default-keyring --keyring ./temp-keyring.gpg \
--fingerprint "EB693B3035CD5710E231E123A4B469963BF863CC"
if [[ $? -ne 0 ]] ; then
echo "ERROR: GPG key import failed. Invalid gpg key?"
exit 1
fi
wget -O /etc/apt/preferences.d/cuda-repository-pin-600 "$REPO_URL/$PINFILE"
wget -O /etc/apt/trusted.gpg.d/$KEYFILE "$REPO_URL/$KEYFILE"
add-apt-repository -y "deb http://developer.download.nvidia.com/compute/cuda/repos/$OSRELEASE/x86_64/ /"
sudo gpg --yes --no-default-keyring --keyring ./temp-keyring.gpg --export \
--output /usr/share/keyrings/cuda-archive-keyring.gpg
rm ./temp-keyring.gpg
rm ./temp-keyring.gpg~
rm ./cuda-archive-keyring.gpg

PINFILE="cuda-$OSRELEASE.pin"
sudo wget -O /etc/apt/preferences.d/cuda-repository-pin-600 "$REPO_URL/$PINFILE"

sudo tee /etc/apt/sources.list.d/cuda-$OSRELEASE-$ARCH.list << 'EOF'
deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] http://developer.download.nvidia.com/compute/cuda/repos/$OSRELEASE/$ARCH/ /
EOF

# Install necessary files
#apt update
echo "* Installing necessary pacakges"
apt install -y build-essential git
sudo apt install -y build-essential git
## need to break this out to fix issue where cuda installs gdm3
apt install -y --no-install-recommends cuda-toolkit
sudo apt install -y --no-install-recommends cuda-toolkit

#fix the path to get nvcc from the cuda package
CUDA_PATH=$(find /usr/local -maxdepth 1 -type d -iname "cuda*")/bin
export PATH=$PATH:$CUDA_PATH

# Get the build and output directories, make them if they don't exist yet
SCRIPT_DIR="$(dirname -- "$(readlink -f -- "$0")")"
PROVIDER_PATH="$(dirname -- "$SCRIPT_DIR")"
BUILD_DIR="$PROVIDER_PATH/build"
BIN_DIR="$PROVIDER_PATH/bin"
DATA_DIR="$PROVIDER_PATH/data"
mkdir -p "$BUILD_DIR" "$BIN_DIR" "$DATA_DIR"

# clone cuda-samples repo
echo "* Cloning cuda-samples repo"
CUDA_SAMPLES_DIR="$BUILD_DIR/cuda-samples"
git clone https://github.com/nvidia/cuda-samples.git "$CUDA_SAMPLES_DIR"
echo "* Building cuda-samples tests"
make -C "$CUDA_SAMPLES_DIR/Samples/0_Introduction/matrixMulDrv"
cp "$CUDA_SAMPLES_DIR/Samples/0_Introduction/matrixMulDrv/matrixMulDrv" "$BIN_DIR/"
cp -r "$CUDA_SAMPLES_DIR/Samples/0_Introduction/matrixMulDrv/data" "$DATA_DIR/matrixMulDrv"
make -C "$CUDA_SAMPLES_DIR/Samples/0_Introduction/vectorAddDrv"
cp "$CUDA_SAMPLES_DIR/Samples/0_Introduction/vectorAddDrv/vectorAddDrv" "$BIN_DIR/"
cp -r "$CUDA_SAMPLES_DIR/Samples/0_Introduction/vectorAddDrv/data" "$DATA_DIR/vectorAddDrv"
make -C "$CUDA_SAMPLES_DIR/Samples/1_Utilities/deviceQueryDrv"
cp "$CUDA_SAMPLES_DIR/Samples/1_Utilities/deviceQueryDrv/deviceQueryDrv" "$BIN_DIR/"
make -C "$CUDA_SAMPLES_DIR/Samples/0_Introduction/simpleTextureDrv"
cp "$CUDA_SAMPLES_DIR/Samples/0_Introduction/simpleTextureDrv/simpleTextureDrv" "$BIN_DIR/"
cp -r "$CUDA_SAMPLES_DIR/Samples/0_Introduction/simpleTextureDrv/data" "$DATA_DIR/simpleTextureDrv"
echo "* Building cuda-samples tests completed..."

# get the gpu-burn repo and build it
echo "* Cloning gpu-burn repo"
GPU_BURN_DIR=/opt/gpu-burn
git clone https://github.com/wilicc/gpu-burn.git $GPU_BURN_DIR
cd $GPU_BURN_DIR
GPU_BURN_DIR="$BUILD_DIR/gpu-burn"
git clone https://github.com/wilicc/gpu-burn.git "$GPU_BURN_DIR"
echo "* Building gpu-burn"
make && echo "* Build completed..."
make -C "$GPU_BURN_DIR"
cp "$GPU_BURN_DIR/gpu_burn" "$BIN_DIR/"
cp "$GPU_BURN_DIR/compare.ptx" "$DATA_DIR/"
echo "* Build completed..."
echo "*"
echo "* Completed installation. Please reboot the machine now"
echo "* to load the nVidia proprietary drivers"
3 changes: 2 additions & 1 deletion providers/gpgpu/units/gpgpu-only.pxu
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ nested_part:
com.canonical.certification::server-info-attachment-automated
com.canonical.certification::server-firmware
com.canonical.certification::server-miscellaneous
com.canonical.certification::gpgpu-tests
com.canonical.certification::gpgpu-stress
com.canonical.certification::gpgpu-automated
include:
bootstrap_include:
device
Expand Down
43 changes: 42 additions & 1 deletion providers/gpgpu/units/jobs.pxu
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,46 @@ plugin: shell
estimated_duration: 300
requires:
package.name == 'cuda-toolkit'
uname.machine == 'x86_64'
_summary: GPGPU stress testing
command: cd /opt/gpu-burn/ && ./gpu_burn 14400 | grep -v -e '^[[:space:]]*$' -e "errors:" -e "Summary at"
command: gpu_burn.sh

id: gpgpu/matrix-mul-drv
category_id: gpgpu
plugin: shell
estimated_duration: 4
requires:
package.name == 'cuda-toolkit'
uname.machine == 'x86_64'
_summary: GPGPU matrix multiplication
command: matrixMulDrv.sh

id: gpgpu/vector-add-drv
category_id: gpgpu
plugin: shell
estimated_duration: 4
requires:
package.name == 'cuda-toolkit'
uname.machine == 'x86_64'
_summary: GPGPU vector addition
command: vectorAddDrv.sh

id: gpgpu/device-query-drv
category_id: gpgpu
plugin: shell
estimated_duration: 4
requires:
package.name == 'cuda-toolkit'
uname.machine == 'x86_64'
_summary: GPGPU query device
command: deviceQueryDrv

id: gpgpu/simple-texture-drv
category_id: gpgpu
plugin: shell
estimated_duration: 4
requires:
package.name == 'cuda-toolkit'
uname.machine == 'x86_64'
_summary: GPGPU simple textures
command: simpleTextureDrv.sh
17 changes: 14 additions & 3 deletions providers/gpgpu/units/test-plan.pxu
Original file line number Diff line number Diff line change
@@ -1,8 +1,19 @@
id: gpgpu-tests
id: gpgpu-stress
unit: test plan
_name: GPGPU Compute Testing
_name: GPGPU Compute Stress Testing
_description:
Tests for GPGPU Computations (non-graphical)
Stress Tests for GPGPU Computations (non-graphical)
mandatory_include:
gpgpu/gpu-burn
include:

id: gpgpu-automated
unit: test plan
_name: GPGPU Compute Automated Testing
_description:
Automated Tests for GPGPU Computations (non-graphical)
include:
gpgpu/matrix-mul-drv
gpgpu/vector-add-drv
gpgpu/device-query-drv
gpgpu/simple-texture-drv

0 comments on commit b3df727

Please sign in to comment.