Make discovery mechanism for `cuda/_include` directory compatible with `pip install --editable` #2846

rwgk · 2024-11-16T01:52:01Z

Description

Also add pytest-xdist (pytest plugin for distributed testing) to test requirements. This speeds up iterative development:

cuda_cooperative:

pytest default (one worker): 322.17s
pytest -n 16: 27.17s

cuda_parallel:

pytest default (one worker): 33.88s
pytest -n 16: 17.00s

pip install --editable documentation:

https://setuptools.pypa.io/en/latest/userguide/development_mode.html (Thanks a lot @leofang for pointing this out to me today!)

Tested interactively with:

cd python/cuda_cooperative
pip3 uninstall cuda_cooperative
rm -rf build cuda/_include
pip3 install --editable .[test]
pytest -v tests/ -n 16

cd python/cuda_parallel
pip3 uninstall cuda_parallel
rm -rf build cuda/_include
pip3 install --editable .[test]
pytest -v tests/ -n 16

Also resolving this warning when testing cuda_cooperative:

  /home/coder/cccl/python/cuda_cooperative/cuda/cooperative/experimental/_nvrtc.py:49: DeprecationWarning: pathlib.Path.__enter__() is deprecated and scheduled for removal in Python 3.13; Path objects as a context manager is a no-op
    with pkg_resources.files('cuda.cooperative').parent.joinpath('_include') as include_path:

…`pip install --editable`

github-actions · 2024-11-16T02:18:17Z

🟩 CI finished in 15m 57s: Pass: 100%/1 | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s

🟩 python: Pass: 100%/1 | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
	CUDA Experimental
+/-	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
	CUDA Experimental
+/-	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 1)

#	Runner
1	`linux-amd64-gpu-v100-latest-1`

leofang · 2024-11-16T02:21:26Z

python/cuda_parallel/cuda/parallel/experimental/__init__.py

-        include_path = importlib.resources.files('cuda').joinpath('_include')
+        include_path = importlib.resources.files(
+            'cuda.parallel').parent.joinpath('_include')


Q: where would this new path locates? site-packages/cuda/parallel/_include or site-packages/cuda/_include? Asking because the plan (#2281) was to share the headers between cuda.parallel and cuda.cooperative, and eventually ship a standalone CCCL wheel that's header-only (but not using, e.g., https://pypi.org/project/nvidia-cuda-cccl-cu12/).

Q: where would this new path locates?

It's not a new path, it's really only the discovery mechanism that's changed in this PR.

Asking because the plan (#2281) was to share the headers between cuda.parallel and cuda.cooperative

Yes, in the regular wheel (without --editable) the directory is (e.g. in my CCCL Dev Container and a devenv virtual environment):

/home/coder/cccl/python/devenv/lib/python3.12/site-packages/cuda/_include

That is shared between cuda.parallel and cuda.cooperative.

But this intermediate artifact from running pip3 install .[test] also exists:

/home/coder/cccl/python/cuda_parallel/cuda/_include/

When using --editable, pip does not make a copy of _include in site-packages/cuda/

But site-packages/cuda does exist, therefore

importlib.resources.files('cuda').joinpath('_include')

produces site-packages/cuda/_include even though that does not exist.

When using

importlib.resources.files('cuda.parallel').parent.joinpath('_include')

the Python import is forced to find cuda.parallel, which does not exist in site-packages/cuda if --editable is used. The --editable feature only drops some "dynamic import forwarding code" (my hand-wavy description) in site-packages/cuda, which finds cuda.parallel in the source directory. Conveniently, _include happens to be in the right place there, too, so the puzzle pieces nicely fall into place.

It took me almost a couple hours yesterday to figure this out btw. 😅

Caveat: I've not tried to pip install cuda.cooperative into the same devenv, but I don't think this PR changes anything if --editable is not used.

leofang · 2024-11-16T02:23:25Z

python/cuda_parallel/setup.py

@@ -112,6 +112,7 @@ def build_extension(self, ext):
    extras_require={
        "test": [
            "pytest",
+            "pytest-xdist",


Note: be careful if we have large-size tests, because GPUs can easily go OOM if tests are running in parallel. In cuQuantum we had this enabled, but at QA time we found this often causes false positive on small GPUs like T4.

Thanks for the warning. In automated testing (e.g. GitHub Actions) I'd add the -n option only when we really need it. In interactive testing it makes a big difference. I posted the timings in the PR description. For cuda_cooperative, pytest -n 16 finishes after 27s, while it takes 322s with only one worker.

…gevtushenko

…gevtushenko

github-actions · 2024-11-18T20:36:54Z

🟩 CI finished in 15m 12s: Pass: 100%/1 | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s

🟩 python: Pass: 100%/1 | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
	CUDA Experimental
+/-	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
	CUDA Experimental
+/-	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 1)

#	Runner
1	`linux-amd64-gpu-v100-latest-1`

@gevtushenko

…h `pip install --editable` (NVIDIA#2846) * Make discovery mechanism for cuda/_include directory compatible with `pip install --editable` * Add pytest-xdist to test requirements. * Apply the same `.parent` trick to cuda_cooperative, as suggested by @gevtushenko

rwgk added 2 commits November 15, 2024 17:14

Make discovery mechanism for cuda/_include directory compatible with …

1719928

…`pip install --editable`

Add pytest-xdist to test requirements.

81fb1e0

rwgk requested a review from a team as a code owner November 16, 2024 01:52

rwgk requested review from gonidelis and gevtushenko and removed request for gonidelis November 16, 2024 01:52

leofang reviewed Nov 16, 2024

View reviewed changes

gevtushenko approved these changes Nov 18, 2024

View reviewed changes

Apply the same .parent trick to cuda_cooperative, as suggested by @…

eac1c30

…gevtushenko

rwgk changed the title ~~Make discovery mechanism for cuda_parallel cuda/_include directory compatible with pip install --editable~~ Make discovery mechanism for cuda/_include directory compatible with pip install --editable Nov 18, 2024

rwgk merged commit 3da084a into NVIDIA:main Nov 18, 2024
20 checks passed

rwgk deleted the cuda_parallel_pip_install_editable branch November 18, 2024 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make discovery mechanism for `cuda/_include` directory compatible with `pip install --editable` #2846

Make discovery mechanism for `cuda/_include` directory compatible with `pip install --editable` #2846

rwgk commented Nov 16, 2024 •

edited

Loading

github-actions bot commented Nov 16, 2024

🟩 python: Pass: 100%/1 | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 1)

leofang Nov 16, 2024

rwgk Nov 16, 2024

leofang Nov 16, 2024

rwgk Nov 18, 2024

github-actions bot commented Nov 18, 2024

🟩 python: Pass: 100%/1 | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 1)

Make discovery mechanism for cuda/_include directory compatible with pip install --editable #2846

Make discovery mechanism for cuda/_include directory compatible with pip install --editable #2846

Conversation

rwgk commented Nov 16, 2024 • edited Loading

Description

github-actions bot commented Nov 16, 2024

🟩 python: Pass: 100%/1 | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 1)

leofang Nov 16, 2024

Choose a reason for hiding this comment

rwgk Nov 16, 2024

Choose a reason for hiding this comment

leofang Nov 16, 2024

Choose a reason for hiding this comment

rwgk Nov 18, 2024

Choose a reason for hiding this comment

github-actions bot commented Nov 18, 2024

🟩 python: Pass: 100%/1 | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 1)

Make discovery mechanism for `cuda/_include` directory compatible with `pip install --editable` #2846

Make discovery mechanism for `cuda/_include` directory compatible with `pip install --editable` #2846

rwgk commented Nov 16, 2024 •

edited

Loading