[STF] Do not keep track of dangling events in a CUDA graph backend #3327

caugonnet · 2025-01-10T16:27:50Z

Description

Unlike the CUDA stream backend, nodes in a CUDA graph are necessarily done when the CUDA graph completes. Therefore keeping track of "dangling events" is a waste of time and resources.

closes

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

… done when the CUDA graph completes. Therefore keeping track of "dangling events" is a waste of time and resources.

copy-pr-bot · 2025-01-10T16:27:53Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

caugonnet · 2025-01-10T16:27:57Z

/ok to test

caugonnet · 2025-01-10T16:33:29Z

cudax/include/cuda/experimental/__stf/internal/backend_ctx.cuh

+     * @brief Indicate if the backend needs to keep track of dangling events, or if these will be automatically
+     * synchronized
+     */
+    virtual bool can_ignore_dangling_events() const = 0;


instead of keeping track if we can ignore them, we should see if we need them to avoid !no_dangling ...

…ds to more readable code

github-actions · 2025-01-10T17:03:14Z

🟨 CI finished in 31m 42s: Pass: 90%/20 | Total: 4h 07m | Avg: 12m 21s | Max: 15m 44s | Hits: 582%/312

🟨 cudax: Pass: 90%/20 | Total: 4h 07m | Avg: 12m 21s | Max: 15m 44s | Hits: 582%/312

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  87%/16  | Total:  3h 20m | Avg: 12m 30s | Max: 15m 44s | Hits: 582%/312   
  🟩 arm64              Pass: 100%/4   | Total: 46m 56s | Avg: 11m 44s | Max: 13m 19s
🔍 ctk: 12.6 🔍
  🟩 12.0               Pass: 100%/1   | Total: 10m 52s | Avg: 10m 52s | Max: 10m 52s | Hits: 582%/156   
  🟩 12.5               Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 25s
  🔍 12.6               Pass:  88%/17  | Total:  3h 43m | Avg: 13m 08s | Max: 15m 44s | Hits: 582%/156   
🔍 cudacxx: nvcc12.6 🔍
  🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 52s | Avg: 10m 52s | Max: 10m 52s | Hits: 582%/156   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 25s
  🔍 nvcc12.6           Pass:  88%/17  | Total:  3h 43m | Avg: 13m 08s | Max: 15m 44s | Hits: 582%/156   
🚨 jobs: Test 🚨
  🟩 Build              Pass: 100%/18  | Total:  3h 36m | Avg: 12m 01s | Max: 15m 44s | Hits: 582%/312   
  🔥 Test               Pass:   0%/2   | Total: 30m 35s | Avg: 15m 17s | Max: 15m 43s
🔍 std: 20 🔍
  🟩 17                 Pass: 100%/4   | Total: 37m 53s | Avg:  9m 28s | Max: 10m 59s
  🔍 20                 Pass:  87%/16  | Total:  3h 29m | Avg: 13m 04s | Max: 15m 44s | Hits: 582%/312   
🟨 cxx
  🟩 Clang14            Pass: 100%/1   | Total: 13m 14s | Avg: 13m 14s | Max: 13m 14s
  🟩 Clang15            Pass: 100%/1   | Total: 14m 28s | Avg: 14m 28s | Max: 14m 28s
  🟩 Clang16            Pass: 100%/1   | Total: 14m 35s | Avg: 14m 35s | Max: 14m 35s
  🟩 Clang17            Pass: 100%/1   | Total: 14m 36s | Avg: 14m 36s | Max: 14m 36s
  🟨 Clang18            Pass:  75%/4   | Total: 52m 51s | Avg: 13m 12s | Max: 15m 43s
  🟩 GCC10              Pass: 100%/1   | Total: 13m 24s | Avg: 13m 24s | Max: 13m 24s
  🟩 GCC11              Pass: 100%/1   | Total: 13m 02s | Avg: 13m 02s | Max: 13m 02s
  🟨 GCC12              Pass:  50%/2   | Total: 30m 36s | Avg: 15m 18s | Max: 15m 44s
  🟩 GCC13              Pass: 100%/4   | Total: 44m 51s | Avg: 11m 12s | Max: 13m 19s
  🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 52s | Avg: 10m 52s | Max: 10m 52s | Hits: 582%/156   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 53s | Avg: 11m 53s | Max: 11m 53s | Hits: 582%/156   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 25s
🟨 cxx_family
  🟨 Clang              Pass:  87%/8   | Total:  1h 49m | Avg: 13m 43s | Max: 15m 43s
  🟨 GCC                Pass:  87%/8   | Total:  1h 41m | Avg: 12m 44s | Max: 15m 44s
  🟩 MSVC               Pass: 100%/2   | Total: 22m 45s | Avg: 11m 22s | Max: 11m 53s | Hits: 582%/312   
  🟩 NVHPC              Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 25s
🟨 cudacxx_family
  🟨 nvcc               Pass:  90%/20  | Total:  4h 07m | Avg: 12m 21s | Max: 15m 44s | Hits: 582%/312   
🟨 gpu
  🟨 v100               Pass:  90%/20  | Total:  4h 07m | Avg: 12m 21s | Max: 15m 44s | Hits: 582%/312   
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  9m 45s | Avg:  9m 45s | Max:  9m 45s
  🟩 90a                Pass: 100%/1   | Total: 10m 48s | Avg: 10m 48s | Max: 10m 48s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 20)

#	Runner
12	`linux-amd64-cpu16`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`
2	`linux-amd64-gpu-v100-latest-1`

caugonnet · 2025-01-13T08:18:30Z

/ok to test

github-actions · 2025-01-13T08:53:56Z

🟨 CI finished in 33m 52s: Pass: 90%/20 | Total: 4h 13m | Avg: 12m 39s | Max: 18m 03s | Hits: 574%/312

🟨 cudax: Pass: 90%/20 | Total: 4h 13m | Avg: 12m 39s | Max: 18m 03s | Hits: 574%/312

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  87%/16  | Total:  3h 25m | Avg: 12m 50s | Max: 18m 03s | Hits: 574%/312   
  🟩 arm64              Pass: 100%/4   | Total: 47m 49s | Avg: 11m 57s | Max: 12m 52s
🔍 ctk: 12.6 🔍
  🟩 12.0               Pass: 100%/1   | Total: 12m 24s | Avg: 12m 24s | Max: 12m 24s | Hits: 574%/156   
  🟩 12.5               Pass: 100%/2   | Total: 14m 14s | Avg:  7m 07s | Max:  7m 13s
  🔍 12.6               Pass:  88%/17  | Total:  3h 46m | Avg: 13m 19s | Max: 18m 03s | Hits: 574%/156   
🔍 cudacxx: nvcc12.6 🔍
  🟩 nvcc12.0           Pass: 100%/1   | Total: 12m 24s | Avg: 12m 24s | Max: 12m 24s | Hits: 574%/156   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 14m 14s | Avg:  7m 07s | Max:  7m 13s
  🔍 nvcc12.6           Pass:  88%/17  | Total:  3h 46m | Avg: 13m 19s | Max: 18m 03s | Hits: 574%/156   
🚨 jobs: Test 🚨
  🟩 Build              Pass: 100%/18  | Total:  3h 39m | Avg: 12m 13s | Max: 15m 10s | Hits: 574%/312   
  🔥 Test               Pass:   0%/2   | Total: 33m 17s | Avg: 16m 38s | Max: 18m 03s
🔍 std: 20 🔍
  🟩 17                 Pass: 100%/4   | Total: 39m 54s | Avg:  9m 58s | Max: 11m 25s
  🔍 20                 Pass:  87%/16  | Total:  3h 33m | Avg: 13m 20s | Max: 18m 03s | Hits: 574%/312   
🟨 cxx
  🟩 Clang14            Pass: 100%/1   | Total: 12m 21s | Avg: 12m 21s | Max: 12m 21s
  🟩 Clang15            Pass: 100%/1   | Total: 13m 46s | Avg: 13m 46s | Max: 13m 46s
  🟩 Clang16            Pass: 100%/1   | Total: 13m 14s | Avg: 13m 14s | Max: 13m 14s
  🟩 Clang17            Pass: 100%/1   | Total: 15m 10s | Avg: 15m 10s | Max: 15m 10s
  🟨 Clang18            Pass:  75%/4   | Total: 56m 02s | Avg: 14m 00s | Max: 18m 03s
  🟩 GCC10              Pass: 100%/1   | Total: 14m 35s | Avg: 14m 35s | Max: 14m 35s
  🟩 GCC11              Pass: 100%/1   | Total: 14m 37s | Avg: 14m 37s | Max: 14m 37s
  🟨 GCC12              Pass:  50%/2   | Total: 29m 59s | Avg: 14m 59s | Max: 15m 14s
  🟩 GCC13              Pass: 100%/4   | Total: 44m 25s | Avg: 11m 06s | Max: 12m 52s
  🟩 MSVC14.36          Pass: 100%/1   | Total: 12m 24s | Avg: 12m 24s | Max: 12m 24s | Hits: 574%/156   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 29s | Avg: 12m 29s | Max: 12m 29s | Hits: 574%/156   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 14m 14s | Avg:  7m 07s | Max:  7m 13s
🟨 cxx_family
  🟨 Clang              Pass:  87%/8   | Total:  1h 50m | Avg: 13m 49s | Max: 18m 03s
  🟨 GCC                Pass:  87%/8   | Total:  1h 43m | Avg: 12m 57s | Max: 15m 14s
  🟩 MSVC               Pass: 100%/2   | Total: 24m 53s | Avg: 12m 26s | Max: 12m 29s | Hits: 574%/312   
  🟩 NVHPC              Pass: 100%/2   | Total: 14m 14s | Avg:  7m 07s | Max:  7m 13s
🟨 cudacxx_family
  🟨 nvcc               Pass:  90%/20  | Total:  4h 13m | Avg: 12m 39s | Max: 18m 03s | Hits: 574%/312   
🟨 gpu
  🟨 v100               Pass:  90%/20  | Total:  4h 13m | Avg: 12m 39s | Max: 18m 03s | Hits: 574%/312   
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  9m 54s | Avg:  9m 54s | Max:  9m 54s
  🟩 90a                Pass: 100%/1   | Total: 10m 17s | Avg: 10m 17s | Max: 10m 17s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 20)

#	Runner
12	`linux-amd64-cpu16`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`
2	`linux-amd64-gpu-v100-latest-1`

…t operations that were producing these events !

caugonnet · 2025-01-13T11:05:13Z

/ok to test

caugonnet · 2025-01-13T11:36:15Z

/ok to test

github-actions · 2025-01-13T12:02:20Z

🟩 CI finished in 24m 42s: Pass: 100%/20 | Total: 1h 51m | Avg: 5m 35s | Max: 20m 24s | Hits: 582%/312

🟩 cudax: Pass: 100%/20 | Total: 1h 51m | Avg: 5m 35s | Max: 20m 24s | Hits: 582%/312

🟩 cpu
  🟩 amd64              Pass: 100%/16  | Total:  1h 41m | Avg:  6m 21s | Max: 20m 24s | Hits: 582%/312   
  🟩 arm64              Pass: 100%/4   | Total: 10m 16s | Avg:  2m 34s | Max:  2m 45s
🟩 ctk
  🟩 12.0               Pass: 100%/1   | Total: 12m 06s | Avg: 12m 06s | Max: 12m 06s | Hits: 582%/156   
  🟩 12.5               Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  5m 09s
  🟩 12.6               Pass: 100%/17  | Total:  1h 29m | Avg:  5m 16s | Max: 20m 24s | Hits: 582%/156   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/1   | Total: 12m 06s | Avg: 12m 06s | Max: 12m 06s | Hits: 582%/156   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  5m 09s
  🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 29m | Avg:  5m 16s | Max: 20m 24s | Hits: 582%/156   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/20  | Total:  1h 51m | Avg:  5m 35s | Max: 20m 24s | Hits: 582%/312   
🟩 cxx
  🟩 Clang14            Pass: 100%/1   | Total:  3m 08s | Avg:  3m 08s | Max:  3m 08s
  🟩 Clang15            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
  🟩 Clang16            Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
  🟩 Clang17            Pass: 100%/1   | Total:  3m 18s | Avg:  3m 18s | Max:  3m 18s
  🟩 Clang18            Pass: 100%/4   | Total: 28m 45s | Avg:  7m 11s | Max: 20m 24s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 11s | Avg:  3m 11s | Max:  3m 11s
  🟩 GCC12              Pass: 100%/2   | Total: 19m 00s | Avg:  9m 30s | Max: 15m 54s
  🟩 GCC13              Pass: 100%/4   | Total: 10m 26s | Avg:  2m 36s | Max:  2m 46s
  🟩 MSVC14.36          Pass: 100%/1   | Total: 12m 06s | Avg: 12m 06s | Max: 12m 06s | Hits: 582%/156   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 54s | Avg: 11m 54s | Max: 11m 54s | Hits: 582%/156   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  5m 09s
🟩 cxx_family
  🟩 Clang              Pass: 100%/8   | Total: 41m 49s | Avg:  5m 13s | Max: 20m 24s
  🟩 GCC                Pass: 100%/8   | Total: 35m 53s | Avg:  4m 29s | Max: 15m 54s
  🟩 MSVC               Pass: 100%/2   | Total: 24m 00s | Avg: 12m 00s | Max: 12m 06s | Hits: 582%/312   
  🟩 NVHPC              Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  5m 09s
🟩 gpu
  🟩 v100               Pass: 100%/20  | Total:  1h 51m | Avg:  5m 35s | Max: 20m 24s | Hits: 582%/312   
🟩 jobs
  🟩 Build              Pass: 100%/18  | Total:  1h 15m | Avg:  4m 11s | Max: 12m 06s | Hits: 582%/312   
  🟩 Test               Pass: 100%/2   | Total: 36m 18s | Avg: 18m 09s | Max: 20m 24s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s
  🟩 90a                Pass: 100%/1   | Total:  2m 37s | Avg:  2m 37s | Max:  2m 37s
🟩 std
  🟩 17                 Pass: 100%/4   | Total: 12m 45s | Avg:  3m 11s | Max:  5m 02s
  🟩 20                 Pass: 100%/16  | Total:  1h 39m | Avg:  6m 11s | Max: 20m 24s | Hits: 582%/312

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 20)

#	Runner
12	`linux-amd64-cpu16`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`
2	`linux-amd64-gpu-v100-latest-1`

Unlike the CUDA stream backend, nodes in a CUDA graph are necessarily…

85d03a6

… done when the CUDA graph completes. Therefore keeping track of "dangling events" is a waste of time and resources.

caugonnet added the stf Sequential Task Flow programming model label Jan 10, 2025

caugonnet commented Jan 10, 2025

View reviewed changes

replace can_ignore_dangling_events by track_dangling_events which lea…

e80c436

…ds to more readable code

Merge branch 'main' into stf_graph_no_dangling_events

564705d

caugonnet marked this pull request as ready for review January 13, 2025 08:18

caugonnet requested a review from a team as a code owner January 13, 2025 08:18

miscco approved these changes Jan 13, 2025

View reviewed changes

When not storing the dangling events, we must still perform the deini…

e52651e

…t operations that were producing these events !

Merge branch 'main' into stf_graph_no_dangling_events

9d37abb

caugonnet merged commit cda5501 into NVIDIA:main Jan 13, 2025
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[STF] Do not keep track of dangling events in a CUDA graph backend #3327

[STF] Do not keep track of dangling events in a CUDA graph backend #3327

caugonnet commented Jan 10, 2025

copy-pr-bot bot commented Jan 10, 2025

caugonnet commented Jan 10, 2025

caugonnet Jan 10, 2025

github-actions bot commented Jan 10, 2025

🟨 cudax: Pass: 90%/20 | Total: 4h 07m | Avg: 12m 21s | Max: 15m 44s | Hits: 582%/312

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)

caugonnet commented Jan 13, 2025

github-actions bot commented Jan 13, 2025

🟨 cudax: Pass: 90%/20 | Total: 4h 13m | Avg: 12m 39s | Max: 18m 03s | Hits: 574%/312

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)

caugonnet commented Jan 13, 2025

caugonnet commented Jan 13, 2025

github-actions bot commented Jan 13, 2025

🟩 cudax: Pass: 100%/20 | Total: 1h 51m | Avg: 5m 35s | Max: 20m 24s | Hits: 582%/312

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)

[STF] Do not keep track of dangling events in a CUDA graph backend #3327

[STF] Do not keep track of dangling events in a CUDA graph backend #3327

Conversation

caugonnet commented Jan 10, 2025

Description

Checklist

copy-pr-bot bot commented Jan 10, 2025

caugonnet commented Jan 10, 2025

caugonnet Jan 10, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 10, 2025

🟨 cudax: Pass: 90%/20 | Total: 4h 07m | Avg: 12m 21s | Max: 15m 44s | Hits: 582%/312

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)

caugonnet commented Jan 13, 2025

github-actions bot commented Jan 13, 2025

🟨 cudax: Pass: 90%/20 | Total: 4h 13m | Avg: 12m 39s | Max: 18m 03s | Hits: 574%/312

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)

caugonnet commented Jan 13, 2025

caugonnet commented Jan 13, 2025

github-actions bot commented Jan 13, 2025

🟩 cudax: Pass: 100%/20 | Total: 1h 51m | Avg: 5m 35s | Max: 20m 24s | Hits: 582%/312

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)