Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMP offloaded code crashes at compilation/runtime depending on optimization flags #122260

Open
JakobSchaeffeler opened this issue Jan 9, 2025 · 2 comments
Labels

Comments

@JakobSchaeffeler
Copy link

JakobSchaeffeler commented Jan 9, 2025

I'm trying to compile the stencil3d-omp benchmark of HeCBench: https://github.com/zjin-lcf/HeCBench/blob/master/src/stencil3d-omp/main.cpp

I'm using LLVM version 19.1.3 and I'm offloading to a MI100 AMD GPU.

If I compile the code with -O3 everything works and the results match with the ones from SYCL and HIP:
make CC=clang++ CFLAGS="-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908"

If I compile the code without any optimization flags the compilation is successful, however at runtime i get the following error:
AMDGPU fatal error 1: Memory access fault by GPU 8 (agent 0x55f222c19fb0) at virtual address 0x7f8c9c06a000. Reasons: Page not present or supervisor privilege Aborted (core dumped)
Lastly, if I compile the code with O0, O1 or O2 I get a Segfault at compilation: O0_compilation_output.txt

The only difference I was able to find between the O0 and O3 version is that the O0 version launches the OpenMP kernels in generic mode and the O3 in generic-SPMD mode. Could this be the reason for the crash?

@llvmbot
Copy link
Member

llvmbot commented Jan 9, 2025

@llvm/issue-subscribers-offload

Author: None (JakobSchaeffeler)

I'm trying to compile the stencil3d-omp benchmark of HeCBench: [https://github.com/zjin-lcf/HeCBench/blob/master/src/stencil3d-omp/main.cpp](https://github.com/zjin-lcf/HeCBench/blob/master/src/stencil3d-omp/main.cpp)

I'm using LLVM version 19.1.3 and I'm offloading to a MI100 AMD GPU.

If I compile the code with -O3 everything works and the results match with the ones from SYCL and HIP:
make CC=clang++ CFLAGS="-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908"

If I compile the code without any optimization flags the compilation is successful, however at runtime i get the following error:
AMDGPU fatal error 1: Memory access fault by GPU 8 (agent 0x55f222c19fb0) at virtual address 0x7f8c9c06a000. Reasons: Page not present or supervisor privilege Aborted (core dumped)
Lastly, if I compile the code with O0, O1 or O2 I get a Segfault at compilation: O0_compilation_output.txt

The only difference I was able to find between the O0 and O3 version is that the O0 version launches the OpenMP kernels in generic mode and the O3 in generic-SPMD mode. Could this be the reason for the crash?

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 9, 2025

Offloading failing at O0 is quite common, as with no optimizations you're more liable to run out of stack space or some other resource. The compiler crashing is definitely more concerning, seems to be crashing in AMDGPUResourceUsageAnalysis. Could you compile with -save-temps and provide the IR? Something that reproduced via https://godbolt.org/z/hj3KEjPfd would be ideal, but just having the GPU IR would help.

The only difference I was able to find between the O0 and O3 version is that the O0 version launches the OpenMP kernels in generic mode and the O3 in generic-SPMD mode. Could this be the reason for the crash?

No, that's just an optimization that's run at O1, it should work without the transform but it will be a lot slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants