-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash: Samsung S24 seems to crash with validation layer #9317
Comments
@Nielsbishere I am trying to grasp what is going on and reproduce a minimal test running this through SPIRV-Cross (https://godbolt.org/z/qE95f8jvb) I at first thought it might be the huge So it seems GPU-AV is not crashing, but at least on linux mesa with ToT drivers, it takes 20 seconds to compile this pipeline. We do 113 instrumentation on descriptor indexes and if I limit it to like 10 or 30 I see the linear build up of time to compile If you could get me a stack trace (aka, want to confirm this idea) that what is happening here is GPU-AV is creating such a large shader that the Android compiler just gives up and crashes |
The reason why this array is so big is because of poor man's bindless; all of the arrays are sized big as an initial test (ofc this is wasteful for memory, so in the future they'd be dynamically sized and reduced to a much more acceptable size). Why is such a slow shader to compile? Is it just because of the descriptor array size? What is a reliable way I can get a stack trace on crash? I tried attaching a signal and printing the stack trace there (this works perfectly fine in most cases like on osx, linux and windows; also on segfaults), but it doesn't seem to work here. It just crashes without giving me any info. Maybe I made a mistake in the stacktrace, I'll try again. Is it possible android has something similar to what windows has (TDR limit) where after a certain time it times you out and just aborts (but in this case not cleanly and not because of a render but rather because of a shader compile)? |
So the array size is NOT the issue, the core issue is if you have something like
we wrap every
so you don't crash so we could (and have talked about many times) doing something where we only check the first use of |
sounds tricky :/ I'll try to get a callstack soon. As a note; https://github.com/Oxsomi/rt_core/blob/main/res/shaders/indirect_prepare.hlsl the real code indicates to me too that DXC is not generating optimal code here, because from what I know, it has to access RWByteAddressBuffer using a uint every time; that's why you see so many stores being emitted. I'm not sure if this is preventable, will raise it there as well to see if there's a solution to make this a bit faster. |
That is a SPIR-V limitation, you need to have the access pointed, there is no way to get a "reference of the If you are using Buffer Device Address, you can actually have a pointer, then it is not having to re-reference things each time For the record, the pattern you have is known, we have a stress test for it even, so I am still trying to understand why your version is so much more slower then our stress test... regardless thanks for raising this, the whole shader instrumentation is an on-going challenge if you go |
Hmmm tricky, DXC also can't just randomly assume you have BDA enabled and even if it did have a fast path there, you'd somehow have to pass it maybe as a root constant (which is impossible because then the type of the descriptor wouldn't be compatible). |
Update: I can confirm that I'm receiving no signals on terminate, so it's not really possible for me to find a stacktrace. I have tested the capture callstack and print call stack and they work fine and also segfault and other signals go through the same handler fine. I think Android just terminates the app without notifying it? Instead of signalling something like SIGTERM which would give me the option to print a stacktrace. Seems like logcat is also not of any use here, it just seems like the app is randomly terminated without a log message. (Can also confirm that the workaround fixes it) |
Ok, then it is probably 95% what I am expecting it to be... will have time later this week to look back into GPU-AV SPIR-V stuff and will try to find a clever way to prevent this for people with similar shaders |
Environment:
Describe the Issue
Creating this specific spirv compute pipeline (vkCreateComputePipelines) will crash the validation layer on android. Both HLSL and SPIRV are available and of course a minimum repro as well.
This seems to be android specific and the validation layer supplied with my current desktop environment seems to behave fine with both the minimum repro and the real final executable.
Expected behavior
Not crashing.
Valid Usage ID
Additional context
https://github.com/Oxsomi/vulkan_validation_layer_repro_android here is the repro, in order to repro it, create lib/arm64-v8a/libVkLayer_khronos_validation.so with the latest validation layer. To build it you need conan, python, cmake, msys2 and android ndk, sdk and jdk set up correctly (.so wasn't included because it was too big).
I can also provide the prebuilt apk if need be. Here is the original source of the SPIRV file: https://github.com/Oxsomi/rt_core/blob/main/res/shaders/indirect_prepare.hlsl which will generate the following spirv:
The text was updated successfully, but these errors were encountered: