Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make discovery mechanism for cuda/_include directory compatible with pip install --editable #2846

Merged
merged 3 commits into from
Nov 18, 2024

Conversation

rwgk
Copy link
Contributor

@rwgk rwgk commented Nov 16, 2024

Description

Also add pytest-xdist (pytest plugin for distributed testing) to test requirements. This speeds up iterative development:

cuda_cooperative:

  • pytest default (one worker): 322.17s
  • pytest -n 16: 27.17s

cuda_parallel:

  • pytest default (one worker): 33.88s
  • pytest -n 16: 17.00s

pip install --editable documentation:

Tested interactively with:

cd python/cuda_cooperative
pip3 uninstall cuda_cooperative
rm -rf build cuda/_include
pip3 install --editable .[test]
pytest -v tests/ -n 16
cd python/cuda_parallel
pip3 uninstall cuda_parallel
rm -rf build cuda/_include
pip3 install --editable .[test]
pytest -v tests/ -n 16

Also resolving this warning when testing cuda_cooperative:

  /home/coder/cccl/python/cuda_cooperative/cuda/cooperative/experimental/_nvrtc.py:49: DeprecationWarning: pathlib.Path.__enter__() is deprecated and scheduled for removal in Python 3.13; Path objects as a context manager is a no-op
    with pkg_resources.files('cuda.cooperative').parent.joinpath('_include') as include_path:

@rwgk rwgk requested a review from a team as a code owner November 16, 2024 01:52
@rwgk rwgk requested review from gonidelis and gevtushenko and removed request for gonidelis November 16, 2024 01:52
Copy link
Contributor

🟩 CI finished in 15m 57s: Pass: 100%/1 | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
  • 🟩 python: Pass: 100%/1 | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-v100-latest-1

Comment on lines 163 to 164
include_path = importlib.resources.files('cuda').joinpath('_include')
include_path = importlib.resources.files(
'cuda.parallel').parent.joinpath('_include')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: where would this new path locates? site-packages/cuda/parallel/_include or site-packages/cuda/_include? Asking because the plan (#2281) was to share the headers between cuda.parallel and cuda.cooperative, and eventually ship a standalone CCCL wheel that's header-only (but not using, e.g., https://pypi.org/project/nvidia-cuda-cccl-cu12/).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: where would this new path locates?

It's not a new path, it's really only the discovery mechanism that's changed in this PR.

Asking because the plan (#2281) was to share the headers between cuda.parallel and cuda.cooperative

Yes, in the regular wheel (without --editable) the directory is (e.g. in my CCCL Dev Container and a devenv virtual environment):

/home/coder/cccl/python/devenv/lib/python3.12/site-packages/cuda/_include

That is shared between cuda.parallel and cuda.cooperative.

But this intermediate artifact from running pip3 install .[test] also exists:

/home/coder/cccl/python/cuda_parallel/cuda/_include/

When using --editable, pip does not make a copy of _include in site-packages/cuda/

But site-packages/cuda does exist, therefore

importlib.resources.files('cuda').joinpath('_include')

produces site-packages/cuda/_include even though that does not exist.

When using

importlib.resources.files('cuda.parallel').parent.joinpath('_include')

the Python import is forced to find cuda.parallel, which does not exist in site-packages/cuda if --editable is used. The --editable feature only drops some "dynamic import forwarding code" (my hand-wavy description) in site-packages/cuda, which finds cuda.parallel in the source directory. Conveniently, _include happens to be in the right place there, too, so the puzzle pieces nicely fall into place.

It took me almost a couple hours yesterday to figure this out btw. 😅

Caveat: I've not tried to pip install cuda.cooperative into the same devenv, but I don't think this PR changes anything if --editable is not used.

@@ -112,6 +112,7 @@ def build_extension(self, ext):
extras_require={
"test": [
"pytest",
"pytest-xdist",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: be careful if we have large-size tests, because GPUs can easily go OOM if tests are running in parallel. In cuQuantum we had this enabled, but at QA time we found this often causes false positive on small GPUs like T4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the warning. In automated testing (e.g. GitHub Actions) I'd add the -n option only when we really need it. In interactive testing it makes a big difference. I posted the timings in the PR description. For cuda_cooperative, pytest -n 16 finishes after 27s, while it takes 322s with only one worker.

@rwgk rwgk changed the title Make discovery mechanism for cuda_parallel cuda/_include directory compatible with pip install --editable Make discovery mechanism for cuda/_include directory compatible with pip install --editable Nov 18, 2024
Copy link
Contributor

🟩 CI finished in 15m 12s: Pass: 100%/1 | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
  • 🟩 python: Pass: 100%/1 | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-v100-latest-1

@rwgk rwgk merged commit 3da084a into NVIDIA:main Nov 18, 2024
20 checks passed
@rwgk rwgk deleted the cuda_parallel_pip_install_editable branch November 18, 2024 21:15
trxcllnt pushed a commit to trxcllnt/cccl that referenced this pull request Nov 23, 2024
…h `pip install --editable` (NVIDIA#2846)

* Make discovery mechanism for cuda/_include directory compatible with `pip
 install --editable`

* Add pytest-xdist to test requirements.

* Apply the same `.parent` trick to cuda_cooperative, as suggested by @gevtushenko
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants