-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DKMS include path parsing #117
base: master
Are you sure you want to change the base?
Conversation
Add the MODE register into the per-wave debug information. This register holds state such as FP rounding and denorm modes, which exceptions are enabled, and active clamping modes. Signed-off-by: Joseph Greathouse <[email protected]> Acked-by: Alex Deucher <[email protected]>
If the platform uses BOCO, don't use BACO in runtime suspend. We could end up executing the BACO path if the platform supports both. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1669 Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
…ate() and amdgpu_ttm_tt_unpopulate() The varialbe gtt in the function amdgpu_ttm_tt_populate() and amdgpu_ttm_tt_unpopulate() is guaranteed to be not NULL in the context. Thus the null-pointer checks are redundant and can be dropped. Reported-by: TOTE Robot <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Tuo Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Enable pp_num_states, pp_cur_state, pp_force_state, pp_table sysfs under SRIOV 1-VF scenario. Signed-off-by: Jiawei Gu <[email protected]>
Add u32 gfx_target_version field to kfd_node_properties and kfd_device_info. Populate <asic>_device_info structs accordingly and expose to sysfs. This allows eliminating device-ID-based lookup tables in user mode for future ASICs. Signed-off-by: Graham Sider <[email protected]> Reviewed-by: Felix Kuehling <[email protected]>
…ables_init()' 'watermarks_table' must be freed instead 'clocks_table', because 'clocks_table' is known to be NULL at this point and 'watermarks_table' is never freed if the last kzalloc fails. Fixes: c98ee89 ("drm/amd/pm: add the fine grain tuning function for vangogh") Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
In some systems only MACO is supported. This is to fix the problem that runtime pm is enabled but BACO is not supported. MACO will be handled seperately. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Jack Gui <[email protected]>
The variable eng_id is being initialized with a value that is never read, it is being re-assigned on the next statment. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
- Extend wait time and add retry, currently 6s * 2times - Change timing algorithm Signed-off-by: Victor Zhao <[email protected]> Signed-off-by: Peng Ju Zhou <[email protected]> Reviewed-by: Victor Zhao <[email protected]>
When init failed in early init stage, amdgpu_object has not been initialized, so hasn't the ttm delayed queue functions. Signed-off-by: YuBiao Wang <[email protected]> Reviewed-by: Emily.Deng <[email protected]>
Add device ids. Signed-off-by: Chengming Gui <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Rui Teng <[email protected]>
There may be multiple instances and only one is harvested. v2: fix typo in commit message Fixes: 83a0b86 ("drm/amdgpu: add judgement when add ip blocks (v2)") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1673 Reviewed-by: Guchun Chen <[email protected]> Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Reviewed-by: Dmytro Laktyushkin <[email protected]> Acked-by: Anson Jacob <[email protected]> Signed-off-by: Eric Bernstein <[email protected]> Cc: [email protected] Tested-by: Daniel Wheeler <[email protected]>
[Why] If the plane has been removed, the writeback disablement logic doesn't run [How] fix the logic order Acked-by: Anson Jacob <[email protected]> Signed-off-by: Roy Chan <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
…cking logic Acked-by: Anson Jacob <[email protected]> Signed-off-by: Roy Chan <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
Acked-by: Anson Jacob <[email protected]> Signed-off-by: Roy Chan <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
[How] the programming sequeune was for old asic. the correct programming sequeunce should be similar to the one used in mpc. the fix is copied from the mpc programming sequeunce. Reviewed-by: Anthony Koo <[email protected]> Acked-by: Anson Jacob <[email protected]> Signed-off-by: Roy Chan <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
Acked-by: Anson Jacob <[email protected]> Signed-off-by: Roy Chan <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
[Why] Developers can find it useful if the driver can produce AUX traces without special equipment. [How] Add AUX tracing. Reviewed-by: Zhan Liu <[email protected]> Acked-by: Anson Jacob <[email protected]> Signed-off-by: Ashley Thomas <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
[why] DM needs to be notified when hdcp module has completed authentication attempt. Reviewed-by: Bhawanpreet Lakha <[email protected]> Reviewed-by: George Shen <[email protected]> Acked-by: Anson Jacob <[email protected]> Signed-off-by: Wenjing Liu <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
Acked-by: Anson Jacob <[email protected]> Signed-off-by: Anthony Koo <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
This version brings along following fixes: - Fix memory allocation in dm IRQ context to use GFP_ATOMIC - Increase timeout threshold for DMCUB reset - Clear GPINT after DMCUB has reset - Add AUX I2C tracing - Fix code commenting style - Some refactoring - Remove invalid assert for ODM + MPC case Reviewed-by: Wyatt Wood <[email protected]> Acked-by: Anson Jacob <[email protected]> Signed-off-by: Anthony Koo <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
[Why] Otherwise we can end up processing whatever was left in the register if the DMCUB was previously reset. If DMCUB gets force reset too early from another client then we might not have even acked the disable yet - causing DMCUB instantly shutdown if the command was 10020000. [How] Move the GPINT clear outside of the reset loop and do it unconditionally after the DMCUB has been properly reset. Reviewed-by: Roy Chan <[email protected]> Reviewed-by: Eric Yang <[email protected]> Acked-by: Anson Jacob <[email protected]> Signed-off-by: Nicholas Kazlauskas <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
[Why] If we're backdoor loading the DMCUB performs more work than just the PHY reset so we can end up resetting before the cleanup has fully finished. [How] Increase timeout, add udelay between spins to guarantee a minimum. Reviewed-by: Roy Chan <[email protected]> Reviewed-by: Eric Yang <[email protected]> Acked-by: Anson Jacob <[email protected]> Signed-off-by: Nicholas Kazlauskas <[email protected]> Tested-by: Daniel Wheeler <[email protected]>
Replace GFP_KERNEL with GFP_ATOMIC as amdgpu_dm_irq_schedule_work can't sleep. BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 253, name: kworker/6:1H CPU: 6 PID: 253 Comm: kworker/6:1H Tainted: G W OE 5.11.0-promotion_2021_06_07-18_36_28_prelim_revert_retrain ROCm#8 Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 3405 02/01/2021 Workqueue: events_highpri dm_irq_work_func [amdgpu] Call Trace: <IRQ> dump_stack+0x5e/0x74 ___might_sleep.cold+0x87/0x98 __might_sleep+0x4b/0x80 kmem_cache_alloc_trace+0x390/0x4f0 amdgpu_dm_irq_handler+0x171/0x230 [amdgpu] amdgpu_irq_dispatch+0xc0/0x1e0 [amdgpu] amdgpu_ih_process+0x81/0x100 [amdgpu] amdgpu_irq_handler+0x26/0xa0 [amdgpu] __handle_irq_event_percpu+0x49/0x190 ? __hrtimer_get_next_event+0x4d/0x80 handle_irq_event_percpu+0x33/0x80 handle_irq_event+0x33/0x60 handle_edge_irq+0x82/0x190 asm_call_irq_on_stack+0x12/0x20 </IRQ> common_interrupt+0xbb/0x140 asm_common_interrupt+0x1e/0x40 RIP: 0010:amdgpu_device_rreg.part.0+0x44/0xf0 [amdgpu] Code: 53 48 89 fb 4c 3b af c8 08 00 00 73 6d 83 e2 02 75 0d f6 87 40 62 01 00 10 0f 85 83 00 00 00 4c 03 ab d0 08 00 00 45 8b 6d 00 <8b> 05 3e b6 52 00 85 c0 7e 62 48 8b 43 08 0f b7 70 3e 65 8b 05 e3 RSP: 0018:ffffae7740fff9e8 EFLAGS: 00000286 RAX: ffffffffc05ee610 RBX: ffff8aaf8f620000 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000005430 RDI: ffff8aaf8f620000 RBP: ffffae7740fffa08 R08: 0000000000000001 R09: 000000000000000a R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000005430 R13: 0000000071000000 R14: 0000000000000001 R15: 0000000000005430 ? amdgpu_cgs_write_register+0x20/0x20 [amdgpu] amdgpu_device_rreg+0x17/0x20 [amdgpu] amdgpu_cgs_read_register+0x14/0x20 [amdgpu] dm_read_reg_func+0x38/0xb0 [amdgpu] generic_reg_wait+0x80/0x160 [amdgpu] dce_aux_transfer_raw+0x324/0x7c0 [amdgpu] dc_link_aux_transfer_raw+0x43/0x50 [amdgpu] dm_dp_aux_transfer+0x87/0x110 [amdgpu] drm_dp_dpcd_access+0x72/0x110 [drm_kms_helper] drm_dp_dpcd_read+0xb7/0xf0 [drm_kms_helper] drm_dp_get_one_sb_msg+0x349/0x480 [drm_kms_helper] drm_dp_mst_hpd_irq+0xc5/0xe40 [drm_kms_helper] ? drm_dp_mst_hpd_irq+0xc5/0xe40 [drm_kms_helper] dm_handle_hpd_rx_irq+0x184/0x1a0 [amdgpu] ? dm_handle_hpd_rx_irq+0x184/0x1a0 [amdgpu] handle_hpd_rx_irq+0x195/0x240 [amdgpu] ? __switch_to_asm+0x42/0x70 ? __switch_to+0x131/0x450 dm_irq_work_func+0x19/0x20 [amdgpu] process_one_work+0x209/0x400 worker_thread+0x4d/0x3e0 ? cancel_delayed_work+0xa0/0xa0 kthread+0x124/0x160 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 Reviewed-by: Aurabindo Jayamohanan Pillai <[email protected]> Acked-by: Anson Jacob <[email protected]> Signed-off-by: Anson Jacob <[email protected]> Cc: [email protected] Tested-by: Daniel Wheeler <[email protected]>
Building with W=1 complains about an empty 'else' statement, so use the usual do-nothing-while-0 loop to quieten this warning. ../drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_psr.c:113:53: warning: suggest braces around empty body in an 'else' statement [-Wempty-body] 113 | *state, retry_count); Fixes: b30eda8 ("drm/amd/display: Add ETW log to dmub_psr_get_state") Signed-off-by: Randy Dunlap <[email protected]> Cc: Wyatt Wood <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: "Pan, Xinhui" <[email protected]> Cc: Harry Wentland <[email protected]> Cc: Leo Li <[email protected]> Cc: [email protected] Cc: [email protected] Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Don't use "begin kernel-doc notation" (/**) for comments that are not kernel-doc. This eliminates warnings reported by the 0day bot. drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:89: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * This shader is used to clear VGPRS and LDS, and also write the input drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:209: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * The below shaders are used to clear SGPRS, and also write the input drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:301: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * This shader is used to clear the uninitiated sgprs after the above Fixes: 0e0036c ("drm/amdgpu: fix no full coverage issue for gprs initialization") Signed-off-by: Randy Dunlap <[email protected]> Reported-by: kernel test robot <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: "Pan, Xinhui" <[email protected]> Cc: Dennis Li <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]>
initial modification of files smu_cmn.c navi10_ppt.c === Test === AMDGPU_PCI_ADDR=`lspci -nn | grep "VGA\|Display" | cut -d " " -f 1` AMDGPU_HWMON=`ls -la /sys/class/hwmon | grep $AMDGPU_PCI_ADDR | awk '{print $9}'` HWMON_DIR=/sys/class/hwmon/${AMDGPU_HWMON} LOGFILE=pp_printf.test.log lspci -nn | grep "VGA\|Display" > $LOGFILE FILES="pp_dpm_sclk pp_sclk_od pp_mclk_od pp_dpm_pcie pp_od_clk_voltage pp_power_profile_mode " for f in $FILES do echo === $f === >> $LOGFILE cat $HWMON_DIR/device/$f >> $LOGFILE done cat $LOGFILE Signed-off-by: Darren Powell <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
modification of smu11 files arcturus_ppt.c sienna_cichlid_ppt.c vangogh_ppt.c === Test === AMDGPU_PCI_ADDR=`lspci -nn | grep "VGA\|Display" | cut -d " " -f 1` AMDGPU_HWMON=`ls -la /sys/class/hwmon | grep $AMDGPU_PCI_ADDR | awk '{print $9}'` HWMON_DIR=/sys/class/hwmon/${AMDGPU_HWMON} LOGFILE=pp_printf.test.log lspci -nn | grep "VGA\|Display" > $LOGFILE FILES="pp_dpm_sclk pp_power_profile_mode " for f in $FILES do echo === $f === >> $LOGFILE cat $HWMON_DIR/device/$f >> $LOGFILE done cat $LOGFILE Signed-off-by: Darren Powell <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
initial modification of files renoir_ppt.c aldebaran_ppt.c yellow_carp_ppt.c === Test === AMDGPU_PCI_ADDR=`lspci -nn | grep "VGA\|Display" | cut -d " " -f 1` AMDGPU_HWMON=`ls -la /sys/class/hwmon | grep $AMDGPU_PCI_ADDR | awk '{print $9}'` HWMON_DIR=/sys/class/hwmon/${AMDGPU_HWMON} LOGFILE=pp_printf.test.log lspci -nn | grep "VGA\|Display" > $LOGFILE FILES="pp_dpm_sclk pp_power_profile_mode " for f in $FILES do echo === $f === >> $LOGFILE cat $HWMON_DIR/device/$f >> $LOGFILE done cat $LOGFILE Signed-off-by: Darren Powell <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
This fix the deadlock with the BO reservations during SVM_BO evictions while allocations in VRAM are concurrently performed. More specific, while the ttm waits for the fence to be signaled (ttm_bo_wait), it already has the BO reserved. In parallel, the restore worker might be running, prefetching memory to VRAM. This also requires to reserve the BO, but blocks the mmap semaphore first. The deadlock happens when the SVM_BO eviction worker kicks in and waits for the mmap semaphore held in restore worker. Preventing signal the fence back, causing the deadlock until the ttm times out. We don't need to hold the BO reservation anymore during validation and mapping. Now the physical addresses are taken from hmm_range_fault. We also take migrate_mutex to prevent range migration while validate_and_map update GPU page table. Signed-off-by: Alex Sierra <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Philip Yang <[email protected]> (cherry picked from commit e934786f57dd4039c3dfacfef2910f3ad29ad8b5) Change-Id: I75ec8b036366d0f8571990ed7c14a13ef21ba82a
sysfs_emit and sysfs_emit_at requrie a page boundary aligned buf address. Make them happy! v2: use an inline function. Warning Log: [ 492.545174] invalid sysfs_emit_at: buf:00000000f19bdfde at:0 [ 492.546416] WARNING: CPU: 7 PID: 1304 at fs/sysfs/file.c:765 sysfs_emit_at+0x4a/0xa0 [ 492.654805] Call Trace: [ 492.655353] ? smu_cmn_get_metrics_table+0x40/0x50 [amdgpu] [ 492.656780] vangogh_print_clk_levels+0x369/0x410 [amdgpu] [ 492.658245] vangogh_common_print_clk_levels+0x77/0x80 [amdgpu] [ 492.659733] ? preempt_schedule_common+0x18/0x30 [ 492.660713] smu_print_ppclk_levels+0x65/0x90 [amdgpu] [ 492.662107] amdgpu_get_pp_od_clk_voltage+0x13d/0x190 [amdgpu] [ 492.663620] dev_attr_show+0x1d/0x40 Signed-off-by: Lang Yu <[email protected]> Acked-by: Huang Rui <[email protected]>
Current RUNPM mechanism relies on PMFW to master the timing for BACO in/exit. And that needs cooperation from sound driver for dstate change notification for function 1(audio). Otherwise(on sound driver missing), BACO cannot be kicked in correctly and hang will be observed on RUNPM exit. By switching back to legacy message way on sound driver missing, we are able to fix the runpm hang observed for the scenario below: amdgpu driver loaded -> runpm suspend kicked -> sound driver loaded Change-Id: I0e44fef11349b5e45e6102913eb46c8c7d279c65 Signed-off-by: Evan Quan <[email protected]> Reported-and-tested-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Guchun Chen <[email protected]>
SVM range may includes multiple VMAs with different vm_flags, if prange page index is the last page of the VMA offset + npages, update GPU mapping to create GPU page table with same VMA access permission. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]>
No function change, use pr_debug_ratelimited to avoid per page debug message overflowing dmesg buf and console log. use dev_err to show error message from unexpected situation, to provide clue to help debug without enabling dynamic debug log. Define dev_fmt to output function name in error message. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]>
migrate_vma_setup may return cpages 0, means 0 page can be migrated, treat this as error case to skip the rest of vma migration steps. Change svm_migrate_vma_to_vram and svm_migrate_vma_to_ram to return the number of pages migrated successfully or error code. The caller add up all the successful migration pages and update prange->actual_loc only if the total migrated pages is not 0. This also removes the warning message "VRAM BO missing during validation" if migration cpages is 0. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Flora Cui <[email protected]> Reviewed-by: Guchun Chen <[email protected]>
Signed-off-by: Flora Cui <[email protected]> Reviewed-by: Guchun Chen <[email protected]>
Enable GFX RAS under generic RAS enablement path Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Change-Id: I41ccc83bee77c4f0c0f50379581cbaf0902a6afd (cherry picked from commit a0020492cbeff6952d49b00148ff8b5fd47f4422) (cherry picked from commit 1abe53dc78f9c2a495256ca8c1b08c08c38648a2)
When creating unregistered new svm range to recover retry fault, avoid new svm range to overlap with ranges or userptr ranges managed by TTM, otherwise svm migration will trigger TTM or userptr eviction, to evict user queues unexpectedly. Change helper amdgpu_ttm_tt_affect_userptr to return userptr which is inside the range. Add helper svm_range_check_vm_userptr to scan all userptr of the vm, and return overlap userptr bo start, last. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]>
In older kernel version 5.9, migrate.cpages reset to 0 by migrate_vma_finalize, so use local variable to save migrate.cpages. This change work for older kernels and upstream kernel. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]>
The userptr can be unmapped by application and still registered to driver, restore userptr work return user pages will get -EFAULT bad address error. Pretend this error as succeed. GPU access this userptr will have VM fault later, it is better than application soft hangs with stalled user mode queues. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]>
On arcturus, not all platforms use PMFW based fan control. On such ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon knobs for fan control also as it is not possible to report or control fan speed on such platforms through driver. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
filter-out first argument is the space separated list of patterns. Current implementation will remove any individual "-include" positional arguments which will make cmdline invalid. Signed-off-by: Sergei Iudin <[email protected]>
This was also just found and fixed internally. I don't think the fix made it into ROCm 4.5, which was just released last night. It should make it into the next release after that. Our internal patch looks different, though:
There was some discussion about a more reliable way to fix this. I'm pointing them at your patch for reference. Thank you for reporting the issue and working out a fix. |
"-include %/kconfig.h" works too. Feel free to close PR if you think it's irrelevant or let me know if you want me to change it from sed to "filter-out" stanza. |
filter-out first argument is the space separated list of patterns.
Current implementation will remove any individual "-include" positional
arguments which will make cmdline invalid.
more details can be found in ROCm/ROCm#1601