-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change to new graphic test strategy (BugFix) #586
Change to new graphic test strategy (BugFix) #586
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this big update! You've led a great work, first with some researches and documentation on the prime and reverse prime process, then with this implementation!
I made some suggestions inline, please check them and let me know if you have any question.
I've seen that you've created new test plans (for instance graphics-gpu-cert-automated
), but it's not immediately clear why this is so. My understanding is that they are only to be used with 22.04+ test plans, in order to keep backward compatibility with previous OSes and jobs? It would be good to have some kind of explanation somewhere (I like to put this kind of info in the git commit message, because there is a higher change to find this info back when doing code archeology later on).
141c780
to
aaf9eda
Compare
@pieqq I've fixed the code and description as your suggestion. If there is something I missed, please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hanhsuan, thanks for well explanation!
I'll run test on DUTs based on our discussion.
I couldn't find the test job below when running graphics-gpu-cert-automated, also failed to run it directly, am I missing something? could you please provide your test steps?
steps to reproduce:
Test steps is fine, just can't be run via ssh, because DUT will try to ask host run graphic test and failed due to no permission(and even if use |
Hey @hanhsuan, as discussed, we ran |
67c4a91
to
b129c98
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #586 +/- ##
==========================================
+ Coverage 34.83% 39.00% +4.16%
==========================================
Files 302 307 +5
Lines 34165 35240 +1075
Branches 5909 6058 +149
==========================================
+ Hits 11903 13744 +1841
+ Misses 21697 20890 -807
- Partials 565 606 +41
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@diohe0311 Thanks for you help to noticed me some issues in my PR, I have modified to make |
/canonical/self-hosted-runners/run-workflows 28e1ba4 |
… depending on index For Nvidia GPU, the prime/reverse prime offload is not supported before version 435.17. Therefore, This new strategy is only for 22.04+. For backward compatibility, this PR add new test plans for 22.04+ as follow: graphics-gpu-cert-full graphics-gpu-cert-automated graphics-gpu-cert-manual after-suspend-graphics-gpu-cert-full after-suspend-graphics-gpu-cert-automated after-suspend-graphics-gpu-cert-manual monitor-gpu-cert-full monitor-gpu-cert-automated monitor-gpu-cert-manual after-suspend-monitor-gpu-cert-full after-suspend-monitor-gpu-cert-automated after-suspend-monitor-gpu-cert-manual And add new python script "prime_offload_tester.py" to execute command with prime/reverse prime setting for new test jobs as follow: Auto test: graphics/{index}_auto_glxgears_{product_slug} graphics/{index}_auto_glxgears_fullscreen_{product_slug} Manual: graphics/{index}_valid_glxgears_{product_slug} graphics/{index}_valid_glxgears_fullscreen_{product_slug}
28e1ba4
to
79599f8
Compare
/canonical/self-hosted-runners/run-workflows 79599f8 |
/canonical/self-hosted-runners/run-workflows 93c5634 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've hit 20 comments mark.
Overall comments are:
- I think the logic regarding not find card is wrong (and test also reflects the wrong logic)
- the error handling is unnecessarily complicated, and not following python
The branch should have been easily split into smaller chunks. For instance the checkbox job definitions don't have to be here. Let's start with the testing program.
I couldn't figure out where is the logic error, cloud you explain more about this? |
2. add extra method for avoid checking fail by 6.5 kernel bug
The submissions for the new commit: |
The bug of 6.5 kernel is a public bug now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- There are
time.sleep
()s being run when executing unit tests from this PR (it takes 2 minutes to run unit tests) - The code is unnecessarily complicated - see below.
- There are unnecessary double exception contexts.
- The
if __name__ == "__main__":
is abused, should only call main - The PR is too big. Please split it into smaller chunks (like one function with unit tests that does one thing).
- The tests provided here are not unit tests. They also suffer from having to monkeypatch a lot of thing in the god object because of the suboptimal problem decomposition.
Overall recommendation: try isolating one problem at a time, write a function and that solves that one problem together with tests for that function and file a PR.
Then another one, another one, and so on. After 4-5PRs, you'll be able to compose a small change to a Checkbox job that uses all those functions.
This cannot land in this state.
This script provides the function to run and validate the process on specific GPU. There is a bug between kernel 6.3 to 6.5.0.14. https://bugs.launchpad.net/ubuntu/+source/linux-oem-6.5/+bug/2047461 Please don't use this script on those kernel versions.
2. Bug of 6.5 kernel is released in proposed kernel 6.5.0.16 and have tested. Therefore, removing workaround.
2. add more unit tests
submission
|
…st plan only. 1. New jobs that uses prime_offload_tester.py to valid GPU rendering. 2. New test plans that conbimes integrated and discrete GPU into one. 3. Remove unnecessary jobs and test plans after switching to new graphic test strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function decomposition proposed here introduces unnecessary complexity, and makes both, the logic, and the tests less readable and harder to maintain.
There are some problem around
In line I'm providing comments and suggestions.
Co-authored-by: kissiel <[email protected]>
Co-authored-by: kissiel <[email protected]>
Co-authored-by: kissiel <[email protected]>
2. fix docstring error 3. change default to 20s and the logic in the check_offload 4. change RuntimeError to SystemExit
@kissiel I have fixed the code. Please help me to review, while you have time. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few tweaks that could be applied to this PR, but it's not critical, so we can land this.
Thank you for the very extensive work on this!
The pxus got removed, so Pierre's request is no longer valid.
* Changing gpu test strategy to prime/reverse-prime gpu offload without depending on index For Nvidia GPU, the prime/reverse prime offload is not supported before version 435.17. Therefore, This new strategy is only for 22.04+. For backward compatibility, this PR add new test plans for 22.04+ as follow: graphics-gpu-cert-full graphics-gpu-cert-automated graphics-gpu-cert-manual after-suspend-graphics-gpu-cert-full after-suspend-graphics-gpu-cert-automated after-suspend-graphics-gpu-cert-manual monitor-gpu-cert-full monitor-gpu-cert-automated monitor-gpu-cert-manual after-suspend-monitor-gpu-cert-full after-suspend-monitor-gpu-cert-automated after-suspend-monitor-gpu-cert-manual And add new python script "prime_offload_tester.py" to execute command with prime/reverse prime setting for new test jobs as follow: Auto test: graphics/{index}_auto_glxgears_{product_slug} graphics/{index}_auto_glxgears_fullscreen_{product_slug} Manual: graphics/{index}_valid_glxgears_{product_slug} graphics/{index}_valid_glxgears_fullscreen_{product_slug} * Add more unit test for graphics_card_resource.py and prime_offload_tester.py * Add one more unit test * move parse arguments to single function for unit testing * Fix flake8 error * 1. Refactory to be more like python 2. add extra method for avoid checking fail by 6.5 kernel bug * Fix flake8 error * add executable permission * 1. Move changes of job and test-plan to another PR 2. Bug of 6.5 kernel is released in proposed kernel 6.5.0.16 and have tested. Therefore, removing workaround. * 1. Move change of jobs and test plan to another PR 2. add more unit tests * Fix pci BDF format check error ref: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/pci/early.c?id=refs/tags/v3.12.7#n65 https://wiki.xenproject.org/wiki/Bus:Device.Function_(BDF)_Notation * Update providers/base/bin/prime_offload_tester.py Co-authored-by: kissiel <[email protected]> * Update providers/base/bin/prime_offload_tester.py Co-authored-by: kissiel <[email protected]> * Update providers/base/bin/prime_offload_tester.py Co-authored-by: kissiel <[email protected]> * 1. move the get clients from check_offload to get_client 2. fix docstring error 3. change default to 20s and the logic in the check_offload 4. change RuntimeError to SystemExit --------- Co-authored-by: kissiel <[email protected]>
* This is part of #586 that includes the changes of job and test plan only. 1. New jobs that uses prime_offload_tester.py to validate GPU rendering. 2. New test plans that combines integrated and discrete GPU into one. 3. Remove unnecessary jobs and test plans after switching to new graphic test strategy. * Separate the test cases of laptop and desktop 1. test cases related to prime offload are used for laptops and All-in-Ones (see below) 2. simply test default renderer for desktops * Add description to let user know the different test targets of prime offload. The graphic configuration of AIO devices is similar to laptops. This kind of configuration is that iGPU will be connected to integrated monitor and some configuration of AIO come with dGPU. To cover this kind of condition, AIO is added to prime offload test group. --------- Co-authored-by: Pierre Equoy <[email protected]>
…al#942) * This is part of canonical#586 that includes the changes of job and test plan only. 1. New jobs that uses prime_offload_tester.py to validate GPU rendering. 2. New test plans that combines integrated and discrete GPU into one. 3. Remove unnecessary jobs and test plans after switching to new graphic test strategy. * Separate the test cases of laptop and desktop 1. test cases related to prime offload are used for laptops and All-in-Ones (see below) 2. simply test default renderer for desktops * Add description to let user know the different test targets of prime offload. The graphic configuration of AIO devices is similar to laptops. This kind of configuration is that iGPU will be connected to integrated monitor and some configuration of AIO come with dGPU. To cover this kind of condition, AIO is added to prime offload test group. --------- Co-authored-by: Pierre Equoy <[email protected]>
Description
Change the GPU test to prime/revert prime strategy to make test jobs to meet real usage, and remove redundant jobs.
Resolved issues
Tracking in this Jira card
Github issue #491
Submissions
I + I
I + A
A + N (This DUT will make nvidia driver error while suspend/resume)
A only