Add additional gpu tests (New) #1359

pedro-avalos · 2024-07-23T15:08:17Z

Description

Update gpu-setup script to work with arm64
Update gpu-setup script to build tests from cuda-samples
Add 4 new gpgpu tests from cuda-samples
Separate gpgpu automated tests from stress tests

Resolved issues

https://warthogs.atlassian.net/browse/CHECKBOX-967

Documentation

N/A

Tests

To run tests: install nvidia drivers; then run tools/gpu-setup bash script to install cuda-toolkit and make the cuda-sample tests.

Run checkbox-cli run com.canonical.certification::gpgpu-automated.

codecov · 2024-07-23T15:12:28Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.12%. Comparing base (9639ba7) to head (659a452).
Report is 126 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1359   +/-   ##
=======================================
  Coverage   45.12%   45.12%           
=======================================
  Files         366      366           
  Lines       39058    39058           
  Branches     6607     6607           
=======================================
  Hits        17626    17626           
  Misses      20758    20758           
  Partials      674      674

Flag	Coverage Δ
provider-gpgpu	`57.14% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Hook25

Please consider the comments below. In general we have to consider distribution when adding dependencies to providers and do so carefully, there are a lot of moving parts.

Consider also rebasing this PR, you will get a new pipeline that builds the packages and checks that everything is allright.

providers/gpgpu/tools/gpu-setup

Some repositories (namely 24.04) do not have the cuda-archive-keyring.gpg file. All relevant repositories have a .pub file, however.

Added matrixMulDrv, vectorAddDrv, deviceQueryDrv, simpleTextureDrv

This hardcodes the current gpg key and checks its fingerprint.

NOTE: It seems like x86_64 is the only architecture supported everywhere. Nvidia seems to support arm64 in *some* cases, but not a lot. Should we only support x86_64, then?

This commit changes the gpu-setup script behaviour to build the cuda-samples and gpu-burn projects inside the `build/bin` directory, then copy them out into the `bin/` and (the necessary data files) into `data/`. For the cuda-samples executables to work, they need access to the data files, but they do not take the path to the data dir as an argument; to circumvent this limitation, I have made wrapper scripts that copy the necessary file into the temporary working directory that checkbox creates. Because of the change in build behaviour, the `gpu-setup` script now runs mostly as a regular user (to avoid permission issues when cleaning directories/builds). The expected operation now is to run `./manage.py build` instead of running the `gpu-setup.sh` script itself. This is more inline with what is done with the other providers.

For now, we are limiting the gpgpu tests to x86_64 since nvidia only supports x86_64 consistently across distributions/releases.

Hook25

Ok the changes are taking shape. Well done!

I've tested the snap builds on your branch and they don't work anymore. This is the "tail" of the log that leads to a failure

Configuring system for GPU Testing
**********************************
*
*  Testing network connectivity
../../tools/gpu-setup: line 10: ping: command not found
ERROR: This script requires internet access to function correctly
make: *** [gpu-setup] Error 1
../../src/Makefile:5: recipe for target 'gpu-setup' failed
Failed to run 'override-build': Exit code was 2.
Build failed

To reproduce this you can do the following

cd checkbox-core-snap; ./prepare.sh series16 
cd series16
snapcraft remote-build

This should fail with the same error I'm giving you. It is crucial that you use remote-build as the image LP uses is slightly different to the one you have in lxd

As to how to fix this, try to add ping to the build-depends of the part. If that works please do also test series18, 20, 22 and 24. Note that for 24 the build process is slightly different (you have to use snapcraft8_prepare.sh and commit the changes, you can then reset HEAD~1 once you are done).

This allows the packaging to complete. The gpgpu provider still fails due to some issues with setting up the repository, but it does not prevent the packaging to complete. We may need to look into vendorizing some of the dependencies...

These paths are now resolved to absolute paths to the gpgpu provider's subdirectories. I also made sure to clean up left-over data files in the wrapper scripts.

This reverts commit a7def84. We will revisit properly packaging the gpgpu provider at a later time.

Hook25

As per our agreement, this is way too (legacy) broken to fix it all without a follow up and we need to decide what is the best way to do that. For now we land this excluded from the provider building system.

Please do create the cards in jira to highlight the issue at hand and we will discuss them and schedule them in a future pulse

pedro-avalos assigned pieqq Jul 23, 2024

Hook25 requested changes Jul 29, 2024

View reviewed changes

providers/gpgpu/tools/gpu-setup Outdated Show resolved Hide resolved

providers/gpgpu/tools/gpu-setup Outdated Show resolved Hide resolved

Hook25 added the waiting-for-changes The review has been completed but the PR is waiting for changes from the author label Jul 29, 2024

Hook25 force-pushed the add-additional-gpu-tests branch from 3c9f6a5 to bbfbd68 Compare July 29, 2024 14:45

pedro-avalos added 10 commits July 29, 2024 16:46

Clone nvidia/cuda-samples repo

3dceb34

Add arm64 support to gpu-setup

970ece7

Fix GPG missing key for cuda repo.

e92c17b

Some repositories (namely 24.04) do not have the cuda-archive-keyring.gpg file. All relevant repositories have a .pub file, however.

Add some cuda-samples tests.

3b68e48

Added matrixMulDrv, vectorAddDrv, deviceQueryDrv, simpleTextureDrv

Use uname -m instead of uname -i

2125c1f

Separate stress test from normal gpgpu tests

5fc54be

Rename gpgpu test plans

0de931c

Fix new gpgpu names in gpgpu-only.pxu

2b37179

Fix typos in gpgpu test-plan.pxu

cf1dfdf

Integrate gpu-setup into manage.py build call.

89dec7b

Hook25 force-pushed the add-additional-gpu-tests branch from bbfbd68 to 89dec7b Compare July 29, 2024 14:46

pedro-avalos added 5 commits July 29, 2024 10:50

Verify cuda GPG key being imported.

9fe8e20

This hardcodes the current gpg key and checks its fingerprint.

Add checks for architectures.

a854201

NOTE: It seems like x86_64 is the only architecture supported everywhere. Nvidia seems to support arm64 in *some* cases, but not a lot. Should we only support x86_64, then?

Double quote to prevent globbing and word splitting

29e7d5e

Gracefully exit on unsupported architectures.

58330a1

For now, we are limiting the gpgpu tests to x86_64 since nvidia only supports x86_64 consistently across distributions/releases.

Hook25 requested changes Jul 31, 2024

View reviewed changes

pedro-avalos added 3 commits July 31, 2024 13:13

Add snap build dependencies for gpgpu provider

a7def84

This allows the packaging to complete. The gpgpu provider still fails due to some issues with setting up the repository, but it does not prevent the packaging to complete. We may need to look into vendorizing some of the dependencies...

Remove use of relative paths in gpu-setup.

f7616ae

These paths are now resolved to absolute paths to the gpgpu provider's subdirectories. I also made sure to clean up left-over data files in the wrapper scripts.

Revert "Add snap build dependencies for gpgpu provider"

659a452

This reverts commit a7def84. We will revisit properly packaging the gpgpu provider at a later time.

Hook25 approved these changes Aug 1, 2024

View reviewed changes

Hook25 assigned Hook25 and unassigned pieqq Aug 1, 2024

pedro-avalos merged commit b3df727 into main Aug 1, 2024
40 checks passed

pedro-avalos deleted the add-additional-gpu-tests branch August 1, 2024 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional gpu tests (New) #1359

Add additional gpu tests (New) #1359

pedro-avalos commented Jul 23, 2024

codecov bot commented Jul 23, 2024 •

edited

Loading

Hook25 left a comment

Hook25 left a comment

Hook25 left a comment

Add additional gpu tests (New) #1359

Add additional gpu tests (New) #1359

Conversation

pedro-avalos commented Jul 23, 2024

Description

Resolved issues

Documentation

Tests

codecov bot commented Jul 23, 2024 • edited Loading

Codecov Report

Hook25 left a comment

Choose a reason for hiding this comment

Hook25 left a comment

Choose a reason for hiding this comment

Hook25 left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 23, 2024 •

edited

Loading