Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][Packaging] Microsoft visual studio C++ redistributable version 14.28.29334.0 causes failure of any other python packaged built with later MSVC versions #44855

Closed
poshul opened this issue Nov 26, 2024 · 33 comments · Fixed by #35323
Assignees
Milestone

Comments

@poshul
Copy link

poshul commented Nov 26, 2024

Describe the bug, including details regarding any error messages, version, and platform.

The visual studio runtime distributed with pyArrow is not compatible with the latest C++ redistributables (14.40... and newer), due to changes in the mutex's constructor (see microsoft/STL#4730). This means that loading any other python libraries that make use of newer C++ redistributables after loading pyarrow fails. See OpenMS/OpenMS#7691 for an example of this behavior. Loading libraries packaged with a newer version of the redistributable and then loading pyarrow works fine.

Suggested fix: update the C++ redistributable version.

Component(s)

Python

@poshul
Copy link
Author

poshul commented Nov 26, 2024

I've checked abrarov/msvc-2019:latest, and it still only has 14.29.30133 included in the image so the DLL would either need to be pulled down manually, or potentially omitted (since recent cpython installations include the redistributable).

@raulcd
Copy link
Member

raulcd commented Nov 26, 2024

I can see other projects like pytorch just letting the user know that those are required, see:
https://github.com/pytorch/pytorch/blob/2d3d6f9d057322a68547be1d6dee32eb65ac3895/torch/__init__.py#L111-L117

@pitrou @kszucs I am no expert on this area but I can't seem to find other projects nowadays shipping it with the wheels? Should we just not ship it and potentially validating is present as what others do?

@REM bundle the msvc runtime
cp "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Redist\MSVC\14.28.29325\x64\Microsoft.VC142.CRT\msvcp140.dll" pyarrow\

@raulcd raulcd changed the title Microsoft visual studio C++ redistributable version 14.28.29334.0 causes failure of any other python packaged built with later MSVC versions [Python][Release] Microsoft visual studio C++ redistributable version 14.28.29334.0 causes failure of any other python packaged built with later MSVC versions Nov 26, 2024
@raulcd raulcd changed the title [Python][Release] Microsoft visual studio C++ redistributable version 14.28.29334.0 causes failure of any other python packaged built with later MSVC versions [Python][Packaging] Microsoft visual studio C++ redistributable version 14.28.29334.0 causes failure of any other python packaged built with later MSVC versions Nov 26, 2024
@pitrou
Copy link
Member

pitrou commented Nov 26, 2024

Hmm, I'm not sure. @zooba would you mind giving advice on this? Should we ship the VC++ redistributable as part of the PyArrow wheels, or let users figure it out for themselves?

@jpfeuffer
Copy link

jpfeuffer commented Nov 26, 2024

In our internal discussion we also saw that current versions of python seem to ship with the VC++ redistributable dlls as well, and when we deleted the dlls in both projects (pyarrow and pyopenms in this case) both of them worked fine, falling back to the dlls in the python directory (instead of the ones shipped in the wheels which end up in the site-packages folders).

I am not sure though if this something defined in a PEP and therefore something to be able to rely on.

(Also, one would need to hope/make sure that the dlls that come with python are compatible with how your own dlls were built).

@zooba
Copy link

zooba commented Nov 26, 2024

If you can, I would statically link the C++ redistributables but not the C ones. CPython includes the C runtime (vcruntime140.dll), and there are important reasons to share state there, but we don't include any C++ runtime (msvcp140.dll and some others), which means if you bundle these DLLs, you're going to have to share with whichever other packages have installed them.

Matplotlib made their build change here: https://github.com/matplotlib/matplotlib/pull/28687/files

It doesn't really get much shorter than their PR, but the short version is that you compile with /MT to tell the compiler you want everything statically linked, and then use /nodefaultlib: to exclude the C runtime and add the dynamically linked ones explicitly. This results in statically linked C++ runtime code.

As far as I'm aware, the C++ runtime is stateless, so if two extensions have different C++ runtimes statically linked then it won't matter. (It might matter if they're sharing C++ objects, but then there are other reasons they'd need to be built with the same compiler version anyway, so you just need to coordinate on that kind of thing.) The C runtime uses a couple of global variables for state, so if you have multiple copies loaded, they won't be in sync and you can get into trouble.

@zooba
Copy link

zooba commented Nov 26, 2024

you're going to have to share with whichever other packages have installed them.

And just to be clear, this bit is usually the problem. If you're the only extension with msvcp140.dll, and there's no system-wide install, then there's no problem. As soon as there are two, it's a risk.

Also, FWIW, for Conda builds there'll be a central copy installed and you'll want to use that. I don't know how that impacts your build system, but if you add the options I described above, you might need to make them optional somehow.

@pitrou
Copy link
Member

pitrou commented Nov 26, 2024

Thanks for the insight @zooba !

If you can, I would statically link the C++ redistributables but not the C ones.

So, given we have multiple extension modules in PyArrow, it would mean multiple copies of the C++ library. We should check that the impact on wheel size remains small.

Also, FWIW, for Conda builds there'll be a central copy installed and you'll want to use that. I don't know how that impacts your build system, but if you add the options I described above, you might need to make them optional somehow.

We would certainly use different compile/link options for wheel builds, then.

@pitrou
Copy link
Member

pitrou commented Nov 26, 2024

Also cc @assignUser

@pitrou pitrou added this to the 19.0.0 milestone Nov 26, 2024
@pitrou pitrou added the Priority: Blocker Marks a blocker for the release label Nov 26, 2024
@QuLogic
Copy link
Contributor

QuLogic commented Dec 2, 2024

So, given we have multiple extension modules in PyArrow, it would mean multiple copies of the C++ library. We should check that the impact on wheel size remains small.

FWIW, in Matplotlib, we have multiple extensions and the total size went down.

@zooba
Copy link

zooba commented Dec 2, 2024

Yes, the static library will only link the functions that are actually used, and if your code is relying on templates then most of it is being generated statically already. I believe Matplotlib only referenced a small handful of functions in total from the C++ runtime.

@pitrou
Copy link
Member

pitrou commented Dec 2, 2024

I've submitted #44902 out of curiosity but I'm quite sure it won't work. If it doesn't indeed, then not shipping the C++ redistributable is the simplest solution for us.

@andrewn3
Copy link

andrewn3 commented Dec 2, 2024

I'm sure if you don't ship the C++ redistributable this will solve the issue, however, it would be useful if the error handling when any of these components fails are pushed up the stack in a more user friendly way.

e. g. I had this non sensible error message at the top of the stack, and it took me a while to trace it to arrow and this library component right at the bottom of the application stack!
image

@pitrou
Copy link
Member

pitrou commented Dec 2, 2024

Yes, we'll probably do something like this snippet:

I can see other projects like pytorch just letting the user know that those are required, see:
https://github.com/pytorch/pytorch/blob/2d3d6f9d057322a68547be1d6dee32eb65ac3895/torch/__init__.py#L111-L117

pitrou added a commit to pitrou/arrow that referenced this issue Dec 17, 2024
Bundling `msvcp140.dll` in Python Windows wheels risks crashes or undefined behavior if another Python package loads another version of the DLL.

Instead of bundling this widespread DLL, we should let the user install it themselves in the (relatively unlikely) case it is not already available on the system.
This ensures that a single DLL version is shared between all Python packages.
@jorisvandenbossche
Copy link
Member

I can't seem to find other projects nowadays shipping it with the wheels?

FWIW both numpy and pandas still seem to include msvcp140.dll in their wheels (just checked those two based on the latest wheels on PyPI)

@jorisvandenbossche
Copy link
Member

Moving my question on the PR here:

we let the user install it themselves in the (relatively unlikely) case it is not already available on the system.

Do we have some idea about when this happens? Does it come generally pre-installed? Or will it generally already be installed because of installing other things?
I am asking because if you have to install that, it is not that trivial (for an experienced developer maybe yes, but not for a newcomer to programming. For example just the page we link to is quite complex, you have to understand visual studio versions, which architecture you have, etc)

@pitrou
Copy link
Member

pitrou commented Dec 18, 2024

I am asking because if you have to install that, it is not that trivial (for an experienced developer maybe yes, but not for a newcomer to programming. For example just the page we link to is quite complex, you have to understand visual studio versions, which architecture you have, etc)

That's a good question. Perhaps we can instead give the direct download link for the system's architecture (x86, x64, arm64).

@pitrou
Copy link
Member

pitrou commented Dec 18, 2024

we let the user install it themselves in the (relatively unlikely) case it is not already available on the system.

Do we have some idea about when this happens? Does it come generally pre-installed? Or will it generally already be installed because of installing other things?

I have no idea how likely it is for msvcp140.dll to be already available on the system. @zooba Would you happen to know?

@poshul
Copy link
Author

poshul commented Dec 18, 2024

Moving my question on the PR here:

we let the user install it themselves in the (relatively unlikely) case it is not already available on the system.

Do we have some idea about when this happens? Does it come generally pre-installed? Or will it generally already be installed because of installing other things? I am asking because if you have to install that, it is not that trivial (for an experienced developer maybe yes, but not for a newcomer to programming. For example just the page we link to is quite complex, you have to understand visual studio versions, which architecture you have, etc)

Since python 3.5 cpythons python installer checks for, and prompt installation of the MSVC (https://docs.python.org/3.5/using/windows.html#installation-steps). They explicitly state that it is NOT included in embedded distribution. And I can't speak to other methods for getting python, but I would imagine that the majority of casual users will have it installed.

@raulcd
Copy link
Member

raulcd commented Dec 18, 2024

At one point we discussed to use delvewheel to add those. From their README:

delvewheel is a command-line tool for creating Python wheel packages for Windows that have DLL dependencies that may not be present on the target system. It is functionally similar to auditwheel (for Linux) and delocate (for macOS).

Suppose that you have built a Python wheel for Windows containing an extension module, but the wheel depends on DLLs that are present in the build environment but may not be present on the end user's machine. This tool determines which DLLs a wheel depends on (aside from system libraries) and copies those DLLs into the wheel.

From the discussion it seems that using this approach is discouraged and the approach should be:

  • statically link
  • independent installation

I'll close:

@pitrou
Copy link
Member

pitrou commented Dec 18, 2024

Since python 3.5 cpythons python installer checks for, and prompt installation of the MSVC

I've just looked at a Python 3.12 install on a Windows VM: it contains private copies of vcruntime140, but it does not seem to have installed msvcp140.dll.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 18, 2024

From the discussion it seems that using this approach is discouraged and the approach should be:

I think the usage of delvewheel or not is unrelated to this discussion (there are other DLLs that we definitely need to include in the wheels), but will further comment on the relevant issue.

Since python 3.5 cpythons python installer checks for, and prompt installation of the MSVC (https://docs.python.org/3.5/using/windows.html#installation-steps).

I don't have a Windows directly available to test, so certainly will believe you saying so :), but just to point out that the link you provided about the installation steps does not actually say anything about that this will happen AFAIU? (I interpreted providing the link as "here it says it will do that")

@pitrou
Copy link
Member

pitrou commented Dec 18, 2024

Ok, here is a standard Python 3.12 install on a Windows VM:

Python 3.12.5 (tags/v3.12.5:ff3bc82, Aug  6 2024, 20:45:27) [MSC v.1940 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> ctypes.CDLL('msvcp140.dll')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python312\Lib\ctypes\__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: Could not find module 'msvcp140.dll' (or one of its dependencies). Try using the full path with constructor syntax.

@kszucs
Copy link
Member

kszucs commented Dec 18, 2024

FWIW scikit-learn bundles it since spicy decided to exclude it from the wheels.

Maybe we should chase the static linking approach?

@pitrou
Copy link
Member

pitrou commented Dec 18, 2024

Maybe we should chase the static linking approach?

We could, do you want to try doing that?

In the meantime, however, we should solve our issue of shipping an outdated DLL.

@kszucs
Copy link
Member

kszucs commented Dec 18, 2024

We could, do you want to try doing that?

Well, someone with an amd64 windows machine could be better suited :)

@jorisvandenbossche
Copy link
Member

FWIW scikit-learn bundles it since spicy decided to exclude it from the wheels.

Potentially relevant comment from one of the scikit-learn issues about this: scikit-learn/scikit-learn#24612 (comment) (this also mentions "All versions of Python currently supported should already have msvcp140.dll, which seems at odds with the observation of Antoine ..)

@pitrou
Copy link
Member

pitrou commented Dec 18, 2024

Potentially relevant comment from one of the scikit-learn issues about this: scikit-learn/scikit-learn#24612 (comment) (this also mentions "All versions of Python currently supported should already have msvcp140.dll, which seems at odds with the observation of Antoine ..)

Let's ping @henryiii as the author of the original comment, then.

I've now installed both 3.10.11 and 3.12.5 using the official python.org installers, neither ships msvcp140.dll.

@pitrou
Copy link
Member

pitrou commented Dec 18, 2024

FWIW both numpy and pandas still seem to include msvcp140.dll in their wheels (just checked those two based on the latest wheels on PyPI)

At least for NumPy, the DLL is renamed in the wheel (I suppose thanks to delvewheel): I have
C:\Python312\Lib\site-packages\numpy.libs\msvcp140-d64049c6e3865410a7dda6a7e9f0c575.dll.

@zooba
Copy link

zooba commented Dec 18, 2024

CPython doesn't ever include msvcp140.dll, or any other C++-only DLLs, and never has. If it's on the system, it's because something else installed it. And if something else has installed it, the only ways you're getting your own copy is to keep it alongside your extension module, put it alongside python.exe (please don't), or rename it and rewrite your module's import tables (like delvewheel does, IIUC).

The download page at https://aka.ms/vcredist lists direct download links - e.g. https://aka.ms/vs/17/release/vc_redist.x64.exe for an Intel 64-bit OS. You can display these directly if you prefer, based on platform.machine()'s result.

Using a renamed msvcp140.dll is likely to be okay provided that your module doesn't leak any C++ (objects or exceptions) into others. If everything is handled cleanly at Python boundaries, you should be equally fine with a renamed C++ runtime or static linking. As far as I'm aware, the msvcp140.dll is stateless, and all state is managed by vcruntime140.dll (which is included with Python, and you should try and rely on the same instance).

Overall, my recommendation for extension modules is to statically link the C++ runtime, and try to avoid sharing C++ objects with other libraries, except through your own APIs (they can handle a pointer and pass it back to you, but that's probably about it).

@pitrou
Copy link
Member

pitrou commented Dec 18, 2024

Using a renamed msvcp140.dll is likely to be okay provided that your module doesn't leak any C++ (objects or exceptions) into others. If everything is handled cleanly at Python boundaries, you should be equally fine with a renamed C++ runtime or static linking.

Ok, I think it may be easier for us to use a renamed msvcp140.dll, then to get our build chain to enable static linking of this specific library.

Edit: also, it would probably be better to bundle an up-to-date version of the DLL :)

@QuLogic
Copy link
Contributor

QuLogic commented Dec 19, 2024

FWIW scikit-learn bundles it since spicy decided to exclude it from the wheels.

Potentially relevant comment from one of the scikit-learn issues about this: scikit-learn/scikit-learn#24612 (comment) (this also mentions "All versions of Python currently supported should already have msvcp140.dll, which seems at odds with the observation of Antoine ..)

This seems to be an error; if you follow through to @henryiii's PR to cibuildwheel, pypa/cibuildwheel#1093, it's actually about vcruntime140.dll and vcruntime140_1.dll.

@raulcd
Copy link
Member

raulcd commented Dec 19, 2024

Does someone know how it fails? on the original issue I can see:

This means that loading any other python libraries that make use of newer C++ redistributables after loading pyarrow fails

I am just trying to understand if the issues mentioned could be related.

@pitrou
Copy link
Member

pitrou commented Dec 19, 2024

Does someone know how it fails?

There's an example ImportError in this issue: OpenMS/OpenMS#7691 (comment)

But, depending on the import order, perhaps it can also crash. Who knows?

pitrou added a commit that referenced this issue Jan 4, 2025
…#35323)

### Rationale for this change

We need to ship the C++ standard library with our Windows wheels, as it is not guaranteed that a recent enough version is present on the system. However, some other Python libraries may require an even more recent version than the one we ship. This may incur crashes when PyArrow is imported before such other Python library, as the older version of the C++ standard library would be used by both.

### What changes are included in this PR?

Use a [fixed-up version](adang1345/delvewheel#59) of delvewheel that allows us to name-mangle an individual DLL, and name-mangle `msvcp140.dll` to ensure that other Python libraries do not reuse the version we ship.

### Are these changes tested?

By regular wheel build tests.

* Closes: #44855
* GitHub Issue: #33981
* GitHub Issue: #44855

Lead-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Raúl Cumplido <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants