-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Windows fatal exception: access violation #40100
Comments
Hi @powersj. This looks similar to #37852 though we weren't able to reproduce in that issue. I couldn't reproduce, though I had to modify your script to run on my system and create a simple server implementation to test. Would it be possible to share a self-contained example of both the client and server code? cc @lidavidm One thing that jumped out at me in your logs are the lines in your traceback like this:
That looks like references to the Windows Store version of Python but I'd expect all the paths to lead to your conda environment so I wonder if the crash is due to mixing two Python environments. Have you tried capturing your crash with WinDbg Preview? @assignUser @kou do either of you have any idea about the build issue at the bottom of the OP? I also get that when I try to do a Windows+conda+clang build. |
For Windows, you'll want to use vcvarsall.bat or whatever the modern equivalent is, don't muck with the env vars yourself. Also, possibly try the VS generator for CMake instead of Ninja. I don't have any clue about the crash itself. We would need a way to reproduce it. You could also try downloading "Windbg Preview" from the Windows Store and running your script as |
I think so too.
If you use one of Visual Studio Generators https://cmake.org/cmake/help/latest/manual/cmake-generators.7.html#visual-studio-generators , you don't need to use |
Thanks for all the responses, especially around building the Python libraries on Windows. It does seem that changing the cmake target has allowed me to get further along via I did find the Python build requires the older 2017 libraries installed that are already referenced in the Python docs. I had some success with the debugger below though.
I launched the notebook and attached to the python process with the time travel option and caught it. How can I better share this with you? Does this collect anything helpful? Would it help to share the time travel capture? fwiw it is 620MB. |
Full stack:
|
Shoot. I think I've seen this once or twice but was never able to figure it out. Right here you basically make an impossible/nonsensical jump:
That is, ToTable should never call that function. So something is seriously borked. I don't really want to blame a "compiler bug" but... Well. When you generated this stack, which PyArrow package were you using? (If a wheel, what version exactly?) We could disassemble |
I am going to assume $ pip show pyarrow
Name: pyarrow
Version: 15.0.0
Summary: Python library for Apache Arrow
Home-page: https://arrow.apache.org/
Author:
Author-email:
License: Apache License, Version 2.0
Location: C:\Users\powersj\v3-ear\venv\Lib\site-packages
Requires: numpy
Required-by: Digging through the site-packages the pyarrow-15.0.0.dist-info/WHEEL I see: Wheel-Version: 1.0
Generator: bdist_wheel (0.41.1)
Root-Is-Purelib: false
Tag: cp311-cp311-win_amd64 |
Ok. I think it's a virtual call:
That'd make sense given the implementation: arrow/cpp/src/arrow/flight/client.cc Lines 113 to 118 in 11ef68d
So I'd hazard that we have a nullptr or otherwise invalid reader here, and instead of crashing we're just jumping to oblivion. That doesn't explain how we got said reader... |
Here's another curious thing.
...that's supposed to initialize the asyncio native library. How is that in the stack trace? |
Hmm, actually, you mention this only happens in a notebook? Does IPython fork the Python kernel process or something? |
Correct, I seem to be able to run the same code as a python script (e.g. What else could I provide to help dig into this further? |
I think we're going to have to replicate it, and then try to track down a debug build, unfortunately. Or if you know the Python stack trace of the crash we could start investigating from that side. |
Thanks for looking into this so far.
In my original comment I used faulthandler to grab a traceback, does that provide any pointers? |
If I'm not mistaken, I don't see any Flight RPC frames in that traceback. |
Hmm, or well possibly it's
but L26 there is just in the middle of making a dictionary... |
@lidavidm is there anything else I could try or provide? |
I think either we need a reproducer to look at, or we need to figure out how to produce a debug build and get a backtrace that way. But I've never figured out exactly how to get a debug build working on Windows. |
It's also possible that it's something like grpc/grpc#29185 which I never managed to track down. |
Would getting the debug grpc logs aid to confirm that it might be related? |
We can look, but for that issue, I had to attach a debugger - the debug grpc logs don't really tell much in case of a crash |
Sorry, I haven't gotten any time to actually fire up a Windows VM and try to attempt anything - I'm heavily timeboxed these days and anything Windows automatically eats up a good portion of the day |
Completely understand, especially since I am unable to provide a direct reproducer. Please do let me know if there is anything else I can provide or help with. Very happy to run some sort of debug build as well. |
I've already spent a bit of time on a reproduction (no luck so far) and can also see about a debug build while I'm there. I'll update here with what I find. |
@amoeba Bryce, have you been able to identify any next steps? |
Hi @dburton-influxdata, I think the next step here is still to get a debug build in your hands. I can take another shot at it in the next two weeks here and let you know how that goes. |
Just as an update: I didn't end up having the time I had hoped so I haven't looked into this more but producing a debug build would be still be the best next step I think. |
Possibly related: #44855 |
Describe the bug, including details regarding any error messages, version, and platform.
Hi,
When using the pyarrow flight client, I have a user who occasionally sees a Windows fatal exception error. This involves a query with multiple subqueries across many fields. I do have access to the environment and can reproduce. We have found that there is some sort of correlation between the number of fields and the exception occurring. As we decrease the number of fields the issue can occur less and less consistently.
I realize that getting an issue without exact steps to reproduce is unhelpful. However, I am more than willing to try out test builds or build a customer version to gather more details if I can get some guidance.
I was able to easily build a custom version on Linux per the dev docs, but I tried building a custom pyarrow on Windows and ran into issues right away with detection of the compiler. I have my steps and logs below.
Observations
arrow_flight.dll
Windows Event Log Message
Code
Traceback
System Information
When using conda:
Build Attempt
It is not clear to me what compiler I am suppose to use, either something from the conda environment or the locally installed one?
If I try setting via the CC and CXX env variables I get:
Component(s)
Python
The text was updated successfully, but these errors were encountered: