Change to new graphic test strategy (BugFix) #586

hanhsuan · 2023-06-28T05:48:18Z

Description

Change the GPU test to prime/revert prime strategy to make test jobs to meet real usage, and remove redundant jobs.

Resolved issues

Tracking in this Jira card
Github issue #491

Submissions

I + I

I + A

auto
graphic

A + N (This DUT will make nvidia driver error while suspend/resume)

auto
graphic

A only

auto
graphic

pieqq

Thank you for this big update! You've led a great work, first with some researches and documentation on the prime and reverse prime process, then with this implementation!

I made some suggestions inline, please check them and let me know if you have any question.

I've seen that you've created new test plans (for instance graphics-gpu-cert-automated), but it's not immediately clear why this is so. My understanding is that they are only to be used with 22.04+ test plans, in order to keep backward compatibility with previous OSes and jobs? It would be good to have some kind of explanation somewhere (I like to put this kind of info in the git commit message, because there is a higher change to find this info back when doing code archeology later on).

providers/base/bin/prime_offload_tester.py

providers/base/units/graphics/jobs.pxu

providers/base/bin/prime_offload_tester.py

hanhsuan · 2023-06-30T01:13:15Z

@pieqq I've fixed the code and description as your suggestion. If there is something I missed, please let me know.

diohe0311

Hi @hanhsuan, thanks for well explanation!
I'll run test on DUTs based on our discussion.

providers/base/bin/prime_offload_tester.py

diohe0311 · 2023-10-26T17:47:28Z

I couldn't find the test job below when running graphics-gpu-cert-automated, also failed to run it directly, am I missing something? could you please provide your test steps?

graphics/1_glxgears_auto_.*
graphics/2_glxgears_auto_.*

steps to reproduce:

sideload base provider in Vision-PV-SKU16_202203-30095, IOKE-PV-SKU6_202304-31461
run sudo lshw -C display (check gpu combination)
run checkbox-cli
select graphics-gpu-cert-automated
can't find the jobs under Graphics tests
- graphics/1_driver_version_PCI_ID_0x4688
- graphics/1_gl_support_PCI_ID_0x4688
- graphics/1_minimum_resolution_PCI_ID_0x4688
- graphics/VESA_drivers_not_in_use
run checkbox-cli run com.canonical.certification::graphics/glxgears, and get Error: couldn't open display (null)
run ./prime_offload_tester.py -c glxgears -p 0000:01:00.0 -d nvidia -t 30, and get Error: couldn't open display (null)

Test steps is fine, just can't be run via ssh, because DUT will try to ask host run graphic test and failed due to no permission(and even if use ssh -x to allow it, it will use the GPU in host, so still nono)

diohe0311 · 2023-10-27T08:59:39Z

Hey @hanhsuan, as discussed, we ran ODM Client Certification for Desktop 22.04 - (1/2) Manual tests and found automated test jobs under it, could you please check on that?

codecov · 2023-10-30T03:47:59Z

Codecov Report

Attention: Patch coverage is 96.66667% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 39.00%. Comparing base (c3c3f45) to head (2cf82cb).
Report is 180 commits behind head on main.

Files	Patch %	Lines
providers/base/bin/prime_offload_tester.py	97.43%	2 Missing and 1 partial ⚠️
providers/resource/bin/graphics_card_resource.py	66.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #586      +/-   ##
==========================================
+ Coverage   34.83%   39.00%   +4.16%     
==========================================
  Files         302      307       +5     
  Lines       34165    35240    +1075     
  Branches     5909     6058     +149     
==========================================
+ Hits        11903    13744    +1841     
+ Misses      21697    20890     -807     
- Partials      565      606      +41

Flag	Coverage Δ
provider-base	`14.52% <97.43%> (+11.39%)`	⬆️
provider-resource	`22.85% <66.66%> (+10.84%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

hanhsuan · 2023-10-30T05:29:09Z

Hey @hanhsuan, as discussed, we ran ODM Client Certification for Desktop 22.04 - (1/2) Manual tests and found automated test jobs under it, could you please check on that?

@diohe0311 Thanks for you help to noticed me some issues in my PR, I have modified to make ODM Client Certification for Desktop 22.04 - (1/2) Manual tests include correct test-plan.

kissiel · 2023-10-30T10:05:28Z

/canonical/self-hosted-runners/run-workflows 28e1ba4

… depending on index For Nvidia GPU, the prime/reverse prime offload is not supported before version 435.17. Therefore, This new strategy is only for 22.04+. For backward compatibility, this PR add new test plans for 22.04+ as follow: graphics-gpu-cert-full graphics-gpu-cert-automated graphics-gpu-cert-manual after-suspend-graphics-gpu-cert-full after-suspend-graphics-gpu-cert-automated after-suspend-graphics-gpu-cert-manual monitor-gpu-cert-full monitor-gpu-cert-automated monitor-gpu-cert-manual after-suspend-monitor-gpu-cert-full after-suspend-monitor-gpu-cert-automated after-suspend-monitor-gpu-cert-manual And add new python script "prime_offload_tester.py" to execute command with prime/reverse prime setting for new test jobs as follow: Auto test: graphics/{index}_auto_glxgears_{product_slug} graphics/{index}_auto_glxgears_fullscreen_{product_slug} Manual: graphics/{index}_valid_glxgears_{product_slug} graphics/{index}_valid_glxgears_fullscreen_{product_slug}

pieqq · 2023-10-31T01:24:56Z

/canonical/self-hosted-runners/run-workflows 79599f8

…ster.py

diohe0311 · 2023-11-23T02:15:28Z

/canonical/self-hosted-runners/run-workflows 93c5634

diohe0311

LGTM+1

providers/base/units/monitor/test-plan.pxu

kissiel

I've hit 20 comments mark.
Overall comments are:

I think the logic regarding not find card is wrong (and test also reflects the wrong logic)
the error handling is unnecessarily complicated, and not following python

The branch should have been easily split into smaller chunks. For instance the checkbox job definitions don't have to be here. Let's start with the testing program.

providers/base/tests/test_prime_offload_tester.py

providers/base/bin/prime_offload_tester.py

hanhsuan · 2023-11-30T08:10:15Z

I think the logic regarding not find card is wrong (and test also reflects the wrong logic)

I couldn't figure out where is the logic error, cloud you explain more about this?

2. add extra method for avoid checking fail by 6.5 kernel bug

hanhsuan · 2023-12-21T07:24:01Z

The submissions for the new commit:
Intel only
Intel + Nvidia
Intel + Nvidia remote test

hanhsuan · 2023-12-27T01:39:41Z

The bug of 6.5 kernel is a public bug now.

kissiel

There are time.sleep()s being run when executing unit tests from this PR (it takes 2 minutes to run unit tests)
The code is unnecessarily complicated - see below.
There are unnecessary double exception contexts.
The if __name__ == "__main__": is abused, should only call main
The PR is too big. Please split it into smaller chunks (like one function with unit tests that does one thing).
The tests provided here are not unit tests. They also suffer from having to monkeypatch a lot of thing in the god object because of the suboptimal problem decomposition.

Overall recommendation: try isolating one problem at a time, write a function and that solves that one problem together with tests for that function and file a PR.
Then another one, another one, and so on. After 4-5PRs, you'll be able to compose a small change to a Checkbox job that uses all those functions.

This cannot land in this state.

providers/base/bin/prime_offload_tester.py

This script provides the function to run and validate the process on specific GPU. There is a bug between kernel 6.3 to 6.5.0.14. https://bugs.launchpad.net/ubuntu/+source/linux-oem-6.5/+bug/2047461 Please don't use this script on those kernel versions.

2. Bug of 6.5 kernel is released in proposed kernel 6.5.0.16 and have tested. Therefore, removing workaround.

2. add more unit tests

hanhsuan · 2024-01-16T07:42:09Z

The patch for 6.5 kernel has been released to proposed (6.5.0.16). Therefore the workaround isn't needed anymore.

submission
submission of this script without workaround that runs on 6.5.0.16

Move the changes of job and test plan to another PR, and will modify 24.04 test plan only to reduce the impact on SRU.

…st plan only. 1. New jobs that uses prime_offload_tester.py to valid GPU rendering. 2. New test plans that conbimes integrated and discrete GPU into one. 3. Remove unnecessary jobs and test plans after switching to new graphic test strategy.

ref: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/pci/early.c?id=refs/tags/v3.12.7#n65 https://wiki.xenproject.org/wiki/Bus:Device.Function_(BDF)_Notation

kissiel

Function decomposition proposed here introduces unnecessary complexity, and makes both, the logic, and the tests less readable and harder to maintain.

There are some problem around

In line I'm providing comments and suggestions.

providers/base/bin/prime_offload_tester.py

Co-authored-by: kissiel <[email protected]>

2. fix docstring error 3. change default to 20s and the logic in the check_offload 4. change RuntimeError to SystemExit

hanhsuan · 2024-03-05T05:26:56Z

@kissiel I have fixed the code. Please help me to review, while you have time. Thanks.

kissiel

There are a few tweaks that could be applied to this PR, but it's not critical, so we can land this.

Thank you for the very extensive work on this!

The pxus got removed, so Pierre's request is no longer valid.

* Changing gpu test strategy to prime/reverse-prime gpu offload without depending on index For Nvidia GPU, the prime/reverse prime offload is not supported before version 435.17. Therefore, This new strategy is only for 22.04+. For backward compatibility, this PR add new test plans for 22.04+ as follow: graphics-gpu-cert-full graphics-gpu-cert-automated graphics-gpu-cert-manual after-suspend-graphics-gpu-cert-full after-suspend-graphics-gpu-cert-automated after-suspend-graphics-gpu-cert-manual monitor-gpu-cert-full monitor-gpu-cert-automated monitor-gpu-cert-manual after-suspend-monitor-gpu-cert-full after-suspend-monitor-gpu-cert-automated after-suspend-monitor-gpu-cert-manual And add new python script "prime_offload_tester.py" to execute command with prime/reverse prime setting for new test jobs as follow: Auto test: graphics/{index}_auto_glxgears_{product_slug} graphics/{index}_auto_glxgears_fullscreen_{product_slug} Manual: graphics/{index}_valid_glxgears_{product_slug} graphics/{index}_valid_glxgears_fullscreen_{product_slug} * Add more unit test for graphics_card_resource.py and prime_offload_tester.py * Add one more unit test * move parse arguments to single function for unit testing * Fix flake8 error * 1. Refactory to be more like python 2. add extra method for avoid checking fail by 6.5 kernel bug * Fix flake8 error * add executable permission * 1. Move changes of job and test-plan to another PR 2. Bug of 6.5 kernel is released in proposed kernel 6.5.0.16 and have tested. Therefore, removing workaround. * 1. Move change of jobs and test plan to another PR 2. add more unit tests * Fix pci BDF format check error ref: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/pci/early.c?id=refs/tags/v3.12.7#n65 https://wiki.xenproject.org/wiki/Bus:Device.Function_(BDF)_Notation * Update providers/base/bin/prime_offload_tester.py Co-authored-by: kissiel <[email protected]> * Update providers/base/bin/prime_offload_tester.py Co-authored-by: kissiel <[email protected]> * Update providers/base/bin/prime_offload_tester.py Co-authored-by: kissiel <[email protected]> * 1. move the get clients from check_offload to get_client 2. fix docstring error 3. change default to 20s and the logic in the check_offload 4. change RuntimeError to SystemExit --------- Co-authored-by: kissiel <[email protected]>

* This is part of #586 that includes the changes of job and test plan only. 1. New jobs that uses prime_offload_tester.py to validate GPU rendering. 2. New test plans that combines integrated and discrete GPU into one. 3. Remove unnecessary jobs and test plans after switching to new graphic test strategy. * Separate the test cases of laptop and desktop 1. test cases related to prime offload are used for laptops and All-in-Ones (see below) 2. simply test default renderer for desktops * Add description to let user know the different test targets of prime offload. The graphic configuration of AIO devices is similar to laptops. This kind of configuration is that iGPU will be connected to integrated monitor and some configuration of AIO come with dGPU. To cover this kind of condition, AIO is added to prime offload test group. --------- Co-authored-by: Pierre Equoy <[email protected]>

…al#942) * This is part of canonical#586 that includes the changes of job and test plan only. 1. New jobs that uses prime_offload_tester.py to validate GPU rendering. 2. New test plans that combines integrated and discrete GPU into one. 3. Remove unnecessary jobs and test plans after switching to new graphic test strategy. * Separate the test cases of laptop and desktop 1. test cases related to prime offload are used for laptops and All-in-Ones (see below) 2. simply test default renderer for desktops * Add description to let user know the different test targets of prime offload. The graphic configuration of AIO devices is similar to laptops. This kind of configuration is that iGPU will be connected to integrated monitor and some configuration of AIO come with dGPU. To cover this kind of condition, AIO is added to prime offload test group. --------- Co-authored-by: Pierre Equoy <[email protected]>

hanhsuan requested review from pieqq and yphus June 28, 2023 05:48

pieqq previously requested changes Jun 29, 2023

View reviewed changes

hanhsuan force-pushed the change_to_new_graphic_test_strategy branch from 141c780 to aaf9eda Compare June 30, 2023 01:07

diohe0311 reviewed Oct 26, 2023

View reviewed changes

providers/base/bin/prime_offload_tester.py Outdated Show resolved Hide resolved

providers/base/bin/prime_offload_tester.py Outdated Show resolved Hide resolved

hanhsuan force-pushed the change_to_new_graphic_test_strategy branch from 67c4a91 to b129c98 Compare October 30, 2023 03:46

hanhsuan force-pushed the change_to_new_graphic_test_strategy branch from 28e1ba4 to 79599f8 Compare October 31, 2023 00:09

hanhsuan changed the title ~~Change to new graphic test strategy~~ Change to new graphic test strategy (BugFix) Oct 31, 2023

hanhsuan added 4 commits October 31, 2023 10:11

Add more unit test for graphics_card_resource.py and prime_offload_te…

0d3acb1

…ster.py

Add one more unit test

2c0f586

move parse arguments to single function for unit testing

098b060

Fix flake8 error

93c5634

diohe0311 previously approved these changes Nov 23, 2023

View reviewed changes

providers/base/units/monitor/test-plan.pxu Outdated Show resolved Hide resolved

kissiel suggested changes Nov 24, 2023

View reviewed changes

hanhsuan mentioned this pull request Dec 13, 2023

id: graphics/{index}_valid_opengl_renderer_{product_slug}` shouldn't expect DRI_PRIME=1 is dGPU #889

Closed

1. Refactory to be more like python

652bac1

2. add extra method for avoid checking fail by 6.5 kernel bug

hanhsuan dismissed diohe0311’s stale review via 652bac1 December 18, 2023 05:23

hanhsuan added 2 commits December 18, 2023 13:35

Fix flake8 error

2b5cfc7

add executable permission

8df58bf

hanhsuan requested review from pieqq, diohe0311 and kissiel December 21, 2023 07:27

kissiel suggested changes Jan 3, 2024

View reviewed changes

baconYao mentioned this pull request Jan 11, 2024

Refactor cpufreq tests with unit tests canonical/checkbox-provider-ce-oem#72

Draft

hanhsuan added 2 commits January 16, 2024 13:44

1. Move changes of job and test-plan to another PR

5c1fa99

2. Bug of 6.5 kernel is released in proposed kernel 6.5.0.16 and have tested. Therefore, removing workaround.

1. Move change of jobs and test plan to another PR

dfe7cf7

2. add more unit tests

hanhsuan requested a review from kissiel January 16, 2024 07:42

hanhsuan mentioned this pull request Jan 17, 2024

Change to new graphic test strategy - jobs and test plans (BugFix) #942

Merged

Fix pci BDF format check error

c5212d6

ref: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/pci/early.c?id=refs/tags/v3.12.7#n65 https://wiki.xenproject.org/wiki/Bus:Device.Function_(BDF)_Notation

kissiel suggested changes Feb 29, 2024

View reviewed changes

hanhsuan and others added 4 commits March 1, 2024 10:04

Update providers/base/bin/prime_offload_tester.py

cb3892f

Co-authored-by: kissiel <[email protected]>

Update providers/base/bin/prime_offload_tester.py

520fba5

Co-authored-by: kissiel <[email protected]>

Update providers/base/bin/prime_offload_tester.py

a57f17b

Co-authored-by: kissiel <[email protected]>

1. move the get clients from check_offload to get_client

2cf82cb

2. fix docstring error 3. change default to 20s and the logic in the check_offload 4. change RuntimeError to SystemExit

hanhsuan requested a review from kissiel March 5, 2024 05:25

hanhsuan mentioned this pull request Mar 8, 2024

graphics/{index}_glxgears_{product_slug} won't fail with wrong renderer #1027

Closed

kissiel approved these changes Mar 18, 2024

View reviewed changes

kissiel merged commit d8063c2 into canonical:main Mar 18, 2024
14 checks passed

pieqq mentioned this pull request Apr 10, 2024

graphics_card_resource.py couldn't define index for iGPU and dGPU correctly #491

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change to new graphic test strategy (BugFix) #586

Change to new graphic test strategy (BugFix) #586

hanhsuan commented Jun 28, 2023 •

edited

Loading

pieqq left a comment

hanhsuan commented Jun 30, 2023

diohe0311 left a comment

diohe0311 commented Oct 26, 2023 •

edited

Loading

diohe0311 commented Oct 27, 2023

codecov bot commented Oct 30, 2023 •

edited

Loading

hanhsuan commented Oct 30, 2023

kissiel commented Oct 30, 2023

pieqq commented Oct 31, 2023

diohe0311 commented Nov 23, 2023 •

edited

Loading

diohe0311 left a comment

kissiel left a comment

hanhsuan commented Nov 30, 2023

hanhsuan commented Dec 21, 2023

hanhsuan commented Dec 27, 2023

kissiel left a comment

hanhsuan commented Jan 16, 2024

kissiel left a comment

hanhsuan commented Mar 5, 2024

kissiel left a comment

Change to new graphic test strategy (BugFix) #586

Change to new graphic test strategy (BugFix) #586

Conversation

hanhsuan commented Jun 28, 2023 • edited Loading

Description

Resolved issues

Submissions

pieqq left a comment

Choose a reason for hiding this comment

hanhsuan commented Jun 30, 2023

diohe0311 left a comment

Choose a reason for hiding this comment

diohe0311 commented Oct 26, 2023 • edited Loading

diohe0311 commented Oct 27, 2023

codecov bot commented Oct 30, 2023 • edited Loading

Codecov Report

hanhsuan commented Oct 30, 2023

kissiel commented Oct 30, 2023

pieqq commented Oct 31, 2023

diohe0311 commented Nov 23, 2023 • edited Loading

diohe0311 left a comment

Choose a reason for hiding this comment

kissiel left a comment

Choose a reason for hiding this comment

hanhsuan commented Nov 30, 2023

hanhsuan commented Dec 21, 2023

hanhsuan commented Dec 27, 2023

kissiel left a comment

Choose a reason for hiding this comment

hanhsuan commented Jan 16, 2024

kissiel left a comment

Choose a reason for hiding this comment

hanhsuan commented Mar 5, 2024

kissiel left a comment

Choose a reason for hiding this comment

hanhsuan commented Jun 28, 2023 •

edited

Loading

diohe0311 commented Oct 26, 2023 •

edited

Loading

codecov bot commented Oct 30, 2023 •

edited

Loading

diohe0311 commented Nov 23, 2023 •

edited

Loading