-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python][Packaging] Microsoft visual studio C++ redistributable version 14.28.29334.0 causes failure of any other python packaged built with later MSVC versions #44855
Comments
I've checked abrarov/msvc-2019:latest, and it still only has 14.29.30133 included in the image so the DLL would either need to be pulled down manually, or potentially omitted (since recent cpython installations include the redistributable). |
I can see other projects like pytorch just letting the user know that those are required, see: @pitrou @kszucs I am no expert on this area but I can't seem to find other projects nowadays shipping it with the wheels? Should we just not ship it and potentially validating is present as what others do? arrow/ci/scripts/python_wheel_windows_build.bat Lines 124 to 125 in f545b90
|
Hmm, I'm not sure. @zooba would you mind giving advice on this? Should we ship the VC++ redistributable as part of the PyArrow wheels, or let users figure it out for themselves? |
In our internal discussion we also saw that current versions of python seem to ship with the VC++ redistributable dlls as well, and when we deleted the dlls in both projects (pyarrow and pyopenms in this case) both of them worked fine, falling back to the dlls in the python directory (instead of the ones shipped in the wheels which end up in the site-packages folders). I am not sure though if this something defined in a PEP and therefore something to be able to rely on. (Also, one would need to hope/make sure that the dlls that come with python are compatible with how your own dlls were built). |
If you can, I would statically link the C++ redistributables but not the C ones. CPython includes the C runtime ( Matplotlib made their build change here: https://github.com/matplotlib/matplotlib/pull/28687/files It doesn't really get much shorter than their PR, but the short version is that you compile with As far as I'm aware, the C++ runtime is stateless, so if two extensions have different C++ runtimes statically linked then it won't matter. (It might matter if they're sharing C++ objects, but then there are other reasons they'd need to be built with the same compiler version anyway, so you just need to coordinate on that kind of thing.) The C runtime uses a couple of global variables for state, so if you have multiple copies loaded, they won't be in sync and you can get into trouble. |
And just to be clear, this bit is usually the problem. If you're the only extension with Also, FWIW, for Conda builds there'll be a central copy installed and you'll want to use that. I don't know how that impacts your build system, but if you add the options I described above, you might need to make them optional somehow. |
Thanks for the insight @zooba !
So, given we have multiple extension modules in PyArrow, it would mean multiple copies of the C++ library. We should check that the impact on wheel size remains small.
We would certainly use different compile/link options for wheel builds, then. |
Also cc @assignUser |
FWIW, in Matplotlib, we have multiple extensions and the total size went down. |
Yes, the static library will only link the functions that are actually used, and if your code is relying on templates then most of it is being generated statically already. I believe Matplotlib only referenced a small handful of functions in total from the C++ runtime. |
I've submitted #44902 out of curiosity but I'm quite sure it won't work. If it doesn't indeed, then not shipping the C++ redistributable is the simplest solution for us. |
Yes, we'll probably do something like this snippet:
|
Bundling `msvcp140.dll` in Python Windows wheels risks crashes or undefined behavior if another Python package loads another version of the DLL. Instead of bundling this widespread DLL, we should let the user install it themselves in the (relatively unlikely) case it is not already available on the system. This ensures that a single DLL version is shared between all Python packages.
FWIW both numpy and pandas still seem to include msvcp140.dll in their wheels (just checked those two based on the latest wheels on PyPI) |
Moving my question on the PR here:
Do we have some idea about when this happens? Does it come generally pre-installed? Or will it generally already be installed because of installing other things? |
That's a good question. Perhaps we can instead give the direct download link for the system's architecture (x86, x64, arm64). |
I have no idea how likely it is for |
Since python 3.5 cpythons python installer checks for, and prompt installation of the MSVC (https://docs.python.org/3.5/using/windows.html#installation-steps). They explicitly state that it is NOT included in embedded distribution. And I can't speak to other methods for getting python, but I would imagine that the majority of casual users will have it installed. |
At one point we discussed to use delvewheel to add those. From their README:
From the discussion it seems that using this approach is discouraged and the approach should be:
I'll close: |
I've just looked at a Python 3.12 install on a Windows VM: it contains private copies of |
I think the usage of delvewheel or not is unrelated to this discussion (there are other DLLs that we definitely need to include in the wheels), but will further comment on the relevant issue.
I don't have a Windows directly available to test, so certainly will believe you saying so :), but just to point out that the link you provided about the installation steps does not actually say anything about that this will happen AFAIU? (I interpreted providing the link as "here it says it will do that") |
Ok, here is a standard Python 3.12 install on a Windows VM: Python 3.12.5 (tags/v3.12.5:ff3bc82, Aug 6 2024, 20:45:27) [MSC v.1940 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> ctypes.CDLL('msvcp140.dll')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python312\Lib\ctypes\__init__.py", line 379, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: Could not find module 'msvcp140.dll' (or one of its dependencies). Try using the full path with constructor syntax. |
FWIW scikit-learn bundles it since spicy decided to exclude it from the wheels. Maybe we should chase the static linking approach? |
We could, do you want to try doing that? In the meantime, however, we should solve our issue of shipping an outdated DLL. |
Well, someone with an amd64 windows machine could be better suited :) |
Potentially relevant comment from one of the scikit-learn issues about this: scikit-learn/scikit-learn#24612 (comment) (this also mentions "All versions of Python currently supported should already have |
Let's ping @henryiii as the author of the original comment, then. I've now installed both 3.10.11 and 3.12.5 using the official python.org installers, neither ships |
At least for NumPy, the DLL is renamed in the wheel (I suppose thanks to |
CPython doesn't ever include msvcp140.dll, or any other C++-only DLLs, and never has. If it's on the system, it's because something else installed it. And if something else has installed it, the only ways you're getting your own copy is to keep it alongside your extension module, put it alongside python.exe (please don't), or rename it and rewrite your module's import tables (like delvewheel does, IIUC). The download page at https://aka.ms/vcredist lists direct download links - e.g. https://aka.ms/vs/17/release/vc_redist.x64.exe for an Intel 64-bit OS. You can display these directly if you prefer, based on Using a renamed msvcp140.dll is likely to be okay provided that your module doesn't leak any C++ (objects or exceptions) into others. If everything is handled cleanly at Python boundaries, you should be equally fine with a renamed C++ runtime or static linking. As far as I'm aware, the msvcp140.dll is stateless, and all state is managed by vcruntime140.dll (which is included with Python, and you should try and rely on the same instance). Overall, my recommendation for extension modules is to statically link the C++ runtime, and try to avoid sharing C++ objects with other libraries, except through your own APIs (they can handle a pointer and pass it back to you, but that's probably about it). |
Ok, I think it may be easier for us to use a renamed Edit: also, it would probably be better to bundle an up-to-date version of the DLL :) |
This seems to be an error; if you follow through to @henryiii's PR to cibuildwheel, pypa/cibuildwheel#1093, it's actually about |
Does someone know how it fails? on the original issue I can see:
I am just trying to understand if the issues mentioned could be related. |
There's an example But, depending on the import order, perhaps it can also crash. Who knows? |
…#35323) ### Rationale for this change We need to ship the C++ standard library with our Windows wheels, as it is not guaranteed that a recent enough version is present on the system. However, some other Python libraries may require an even more recent version than the one we ship. This may incur crashes when PyArrow is imported before such other Python library, as the older version of the C++ standard library would be used by both. ### What changes are included in this PR? Use a [fixed-up version](adang1345/delvewheel#59) of delvewheel that allows us to name-mangle an individual DLL, and name-mangle `msvcp140.dll` to ensure that other Python libraries do not reuse the version we ship. ### Are these changes tested? By regular wheel build tests. * Closes: #44855 * GitHub Issue: #33981 * GitHub Issue: #44855 Lead-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Raúl Cumplido <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Describe the bug, including details regarding any error messages, version, and platform.
The visual studio runtime distributed with pyArrow is not compatible with the latest C++ redistributables (14.40... and newer), due to changes in the mutex's constructor (see microsoft/STL#4730). This means that loading any other python libraries that make use of newer C++ redistributables after loading pyarrow fails. See OpenMS/OpenMS#7691 for an example of this behavior. Loading libraries packaged with a newer version of the redistributable and then loading pyarrow works fine.
Suggested fix: update the C++ redistributable version.
Component(s)
Python
The text was updated successfully, but these errors were encountered: