-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional gpu tests (New) #1359
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1359 +/- ##
=======================================
Coverage 45.12% 45.12%
=======================================
Files 366 366
Lines 39058 39058
Branches 6607 6607
=======================================
Hits 17626 17626
Misses 20758 20758
Partials 674 674
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider the comments below. In general we have to consider distribution when adding dependencies to providers and do so carefully, there are a lot of moving parts.
Consider also rebasing this PR, you will get a new pipeline that builds the packages and checks that everything is allright.
3c9f6a5
to
bbfbd68
Compare
Some repositories (namely 24.04) do not have the cuda-archive-keyring.gpg file. All relevant repositories have a .pub file, however.
Added matrixMulDrv, vectorAddDrv, deviceQueryDrv, simpleTextureDrv
bbfbd68
to
89dec7b
Compare
This hardcodes the current gpg key and checks its fingerprint.
NOTE: It seems like x86_64 is the only architecture supported everywhere. Nvidia seems to support arm64 in *some* cases, but not a lot. Should we only support x86_64, then?
This commit changes the gpu-setup script behaviour to build the cuda-samples and gpu-burn projects inside the `build/bin` directory, then copy them out into the `bin/` and (the necessary data files) into `data/`. For the cuda-samples executables to work, they need access to the data files, but they do not take the path to the data dir as an argument; to circumvent this limitation, I have made wrapper scripts that copy the necessary file into the temporary working directory that checkbox creates. Because of the change in build behaviour, the `gpu-setup` script now runs mostly as a regular user (to avoid permission issues when cleaning directories/builds). The expected operation now is to run `./manage.py build` instead of running the `gpu-setup.sh` script itself. This is more inline with what is done with the other providers.
For now, we are limiting the gpgpu tests to x86_64 since nvidia only supports x86_64 consistently across distributions/releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok the changes are taking shape. Well done!
I've tested the snap builds on your branch and they don't work anymore. This is the "tail" of the log that leads to a failure
Configuring system for GPU Testing
**********************************
*
* Testing network connectivity
../../tools/gpu-setup: line 10: ping: command not found
ERROR: This script requires internet access to function correctly
make: *** [gpu-setup] Error 1
../../src/Makefile:5: recipe for target 'gpu-setup' failed
Failed to run 'override-build': Exit code was 2.
Build failed
To reproduce this you can do the following
cd checkbox-core-snap; ./prepare.sh series16
cd series16
snapcraft remote-build
This should fail with the same error I'm giving you. It is crucial that you use remote-build as the image LP uses is slightly different to the one you have in lxd
As to how to fix this, try to add ping to the build-depends of the part. If that works please do also test series18, 20, 22 and 24. Note that for 24 the build process is slightly different (you have to use snapcraft8_prepare.sh and commit the changes, you can then reset HEAD~1 once you are done).
This allows the packaging to complete. The gpgpu provider still fails due to some issues with setting up the repository, but it does not prevent the packaging to complete. We may need to look into vendorizing some of the dependencies...
These paths are now resolved to absolute paths to the gpgpu provider's subdirectories. I also made sure to clean up left-over data files in the wrapper scripts.
This reverts commit a7def84. We will revisit properly packaging the gpgpu provider at a later time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per our agreement, this is way too (legacy) broken to fix it all without a follow up and we need to decide what is the best way to do that. For now we land this excluded from the provider building system.
Please do create the cards in jira to highlight the issue at hand and we will discuss them and schedule them in a future pulse
Description
Resolved issues
https://warthogs.atlassian.net/browse/CHECKBOX-967
Documentation
N/A
Tests
To run tests: install nvidia drivers; then run tools/gpu-setup bash script to install cuda-toolkit and make the cuda-sample tests.
Run
checkbox-cli run com.canonical.certification::gpgpu-automated
.