Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STF] Do not keep track of dangling events in a CUDA graph backend #3327

Merged
merged 5 commits into from
Jan 13, 2025

Conversation

caugonnet
Copy link
Contributor

Description

Unlike the CUDA stream backend, nodes in a CUDA graph are necessarily done when the CUDA graph completes. Therefore keeping track of "dangling events" is a waste of time and resources.

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

… done when

the CUDA graph completes. Therefore keeping track of "dangling events" is a
waste of time and resources.
Copy link

copy-pr-bot bot commented Jan 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@caugonnet
Copy link
Contributor Author

/ok to test

@caugonnet caugonnet added the stf Sequential Task Flow programming model label Jan 10, 2025
* @brief Indicate if the backend needs to keep track of dangling events, or if these will be automatically
* synchronized
*/
virtual bool can_ignore_dangling_events() const = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of keeping track if we can ignore them, we should see if we need them to avoid !no_dangling ...

Copy link
Contributor

🟨 CI finished in 31m 42s: Pass: 90%/20 | Total: 4h 07m | Avg: 12m 21s | Max: 15m 44s | Hits: 582%/312
  • 🟨 cudax: Pass: 90%/20 | Total: 4h 07m | Avg: 12m 21s | Max: 15m 44s | Hits: 582%/312

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  87%/16  | Total:  3h 20m | Avg: 12m 30s | Max: 15m 44s | Hits: 582%/312   
      🟩 arm64              Pass: 100%/4   | Total: 46m 56s | Avg: 11m 44s | Max: 13m 19s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/1   | Total: 10m 52s | Avg: 10m 52s | Max: 10m 52s | Hits: 582%/156   
      🟩 12.5               Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 25s
      🔍 12.6               Pass:  88%/17  | Total:  3h 43m | Avg: 13m 08s | Max: 15m 44s | Hits: 582%/156   
    🔍 cudacxx: nvcc12.6 🔍
      🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 52s | Avg: 10m 52s | Max: 10m 52s | Hits: 582%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 25s
      🔍 nvcc12.6           Pass:  88%/17  | Total:  3h 43m | Avg: 13m 08s | Max: 15m 44s | Hits: 582%/156   
    🚨 jobs: Test 🚨
      🟩 Build              Pass: 100%/18  | Total:  3h 36m | Avg: 12m 01s | Max: 15m 44s | Hits: 582%/312   
      🔥 Test               Pass:   0%/2   | Total: 30m 35s | Avg: 15m 17s | Max: 15m 43s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/4   | Total: 37m 53s | Avg:  9m 28s | Max: 10m 59s
      🔍 20                 Pass:  87%/16  | Total:  3h 29m | Avg: 13m 04s | Max: 15m 44s | Hits: 582%/312   
    🟨 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 13m 14s | Avg: 13m 14s | Max: 13m 14s
      🟩 Clang15            Pass: 100%/1   | Total: 14m 28s | Avg: 14m 28s | Max: 14m 28s
      🟩 Clang16            Pass: 100%/1   | Total: 14m 35s | Avg: 14m 35s | Max: 14m 35s
      🟩 Clang17            Pass: 100%/1   | Total: 14m 36s | Avg: 14m 36s | Max: 14m 36s
      🟨 Clang18            Pass:  75%/4   | Total: 52m 51s | Avg: 13m 12s | Max: 15m 43s
      🟩 GCC10              Pass: 100%/1   | Total: 13m 24s | Avg: 13m 24s | Max: 13m 24s
      🟩 GCC11              Pass: 100%/1   | Total: 13m 02s | Avg: 13m 02s | Max: 13m 02s
      🟨 GCC12              Pass:  50%/2   | Total: 30m 36s | Avg: 15m 18s | Max: 15m 44s
      🟩 GCC13              Pass: 100%/4   | Total: 44m 51s | Avg: 11m 12s | Max: 13m 19s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 52s | Avg: 10m 52s | Max: 10m 52s | Hits: 582%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 53s | Avg: 11m 53s | Max: 11m 53s | Hits: 582%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 25s
    🟨 cxx_family
      🟨 Clang              Pass:  87%/8   | Total:  1h 49m | Avg: 13m 43s | Max: 15m 43s
      🟨 GCC                Pass:  87%/8   | Total:  1h 41m | Avg: 12m 44s | Max: 15m 44s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 45s | Avg: 11m 22s | Max: 11m 53s | Hits: 582%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 25s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  90%/20  | Total:  4h 07m | Avg: 12m 21s | Max: 15m 44s | Hits: 582%/312   
    🟨 gpu
      🟨 v100               Pass:  90%/20  | Total:  4h 07m | Avg: 12m 21s | Max: 15m 44s | Hits: 582%/312   
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  9m 45s | Avg:  9m 45s | Max:  9m 45s
      🟩 90a                Pass: 100%/1   | Total: 10m 48s | Avg: 10m 48s | Max: 10m 48s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

@caugonnet caugonnet marked this pull request as ready for review January 13, 2025 08:18
@caugonnet caugonnet requested a review from a team as a code owner January 13, 2025 08:18
@caugonnet
Copy link
Contributor Author

/ok to test

Copy link
Contributor

🟨 CI finished in 33m 52s: Pass: 90%/20 | Total: 4h 13m | Avg: 12m 39s | Max: 18m 03s | Hits: 574%/312
  • 🟨 cudax: Pass: 90%/20 | Total: 4h 13m | Avg: 12m 39s | Max: 18m 03s | Hits: 574%/312

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  87%/16  | Total:  3h 25m | Avg: 12m 50s | Max: 18m 03s | Hits: 574%/312   
      🟩 arm64              Pass: 100%/4   | Total: 47m 49s | Avg: 11m 57s | Max: 12m 52s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/1   | Total: 12m 24s | Avg: 12m 24s | Max: 12m 24s | Hits: 574%/156   
      🟩 12.5               Pass: 100%/2   | Total: 14m 14s | Avg:  7m 07s | Max:  7m 13s
      🔍 12.6               Pass:  88%/17  | Total:  3h 46m | Avg: 13m 19s | Max: 18m 03s | Hits: 574%/156   
    🔍 cudacxx: nvcc12.6 🔍
      🟩 nvcc12.0           Pass: 100%/1   | Total: 12m 24s | Avg: 12m 24s | Max: 12m 24s | Hits: 574%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 14m 14s | Avg:  7m 07s | Max:  7m 13s
      🔍 nvcc12.6           Pass:  88%/17  | Total:  3h 46m | Avg: 13m 19s | Max: 18m 03s | Hits: 574%/156   
    🚨 jobs: Test 🚨
      🟩 Build              Pass: 100%/18  | Total:  3h 39m | Avg: 12m 13s | Max: 15m 10s | Hits: 574%/312   
      🔥 Test               Pass:   0%/2   | Total: 33m 17s | Avg: 16m 38s | Max: 18m 03s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/4   | Total: 39m 54s | Avg:  9m 58s | Max: 11m 25s
      🔍 20                 Pass:  87%/16  | Total:  3h 33m | Avg: 13m 20s | Max: 18m 03s | Hits: 574%/312   
    🟨 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 12m 21s | Avg: 12m 21s | Max: 12m 21s
      🟩 Clang15            Pass: 100%/1   | Total: 13m 46s | Avg: 13m 46s | Max: 13m 46s
      🟩 Clang16            Pass: 100%/1   | Total: 13m 14s | Avg: 13m 14s | Max: 13m 14s
      🟩 Clang17            Pass: 100%/1   | Total: 15m 10s | Avg: 15m 10s | Max: 15m 10s
      🟨 Clang18            Pass:  75%/4   | Total: 56m 02s | Avg: 14m 00s | Max: 18m 03s
      🟩 GCC10              Pass: 100%/1   | Total: 14m 35s | Avg: 14m 35s | Max: 14m 35s
      🟩 GCC11              Pass: 100%/1   | Total: 14m 37s | Avg: 14m 37s | Max: 14m 37s
      🟨 GCC12              Pass:  50%/2   | Total: 29m 59s | Avg: 14m 59s | Max: 15m 14s
      🟩 GCC13              Pass: 100%/4   | Total: 44m 25s | Avg: 11m 06s | Max: 12m 52s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 12m 24s | Avg: 12m 24s | Max: 12m 24s | Hits: 574%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 29s | Avg: 12m 29s | Max: 12m 29s | Hits: 574%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 14m 14s | Avg:  7m 07s | Max:  7m 13s
    🟨 cxx_family
      🟨 Clang              Pass:  87%/8   | Total:  1h 50m | Avg: 13m 49s | Max: 18m 03s
      🟨 GCC                Pass:  87%/8   | Total:  1h 43m | Avg: 12m 57s | Max: 15m 14s
      🟩 MSVC               Pass: 100%/2   | Total: 24m 53s | Avg: 12m 26s | Max: 12m 29s | Hits: 574%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 14m 14s | Avg:  7m 07s | Max:  7m 13s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  90%/20  | Total:  4h 13m | Avg: 12m 39s | Max: 18m 03s | Hits: 574%/312   
    🟨 gpu
      🟨 v100               Pass:  90%/20  | Total:  4h 13m | Avg: 12m 39s | Max: 18m 03s | Hits: 574%/312   
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  9m 54s | Avg:  9m 54s | Max:  9m 54s
      🟩 90a                Pass: 100%/1   | Total: 10m 17s | Avg: 10m 17s | Max: 10m 17s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

…t operations that were producing these events !
@caugonnet
Copy link
Contributor Author

/ok to test

@caugonnet
Copy link
Contributor Author

/ok to test

Copy link
Contributor

🟩 CI finished in 24m 42s: Pass: 100%/20 | Total: 1h 51m | Avg: 5m 35s | Max: 20m 24s | Hits: 582%/312
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 51m | Avg: 5m 35s | Max: 20m 24s | Hits: 582%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 41m | Avg:  6m 21s | Max: 20m 24s | Hits: 582%/312   
      🟩 arm64              Pass: 100%/4   | Total: 10m 16s | Avg:  2m 34s | Max:  2m 45s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 12m 06s | Avg: 12m 06s | Max: 12m 06s | Hits: 582%/156   
      🟩 12.5               Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  5m 09s
      🟩 12.6               Pass: 100%/17  | Total:  1h 29m | Avg:  5m 16s | Max: 20m 24s | Hits: 582%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 12m 06s | Avg: 12m 06s | Max: 12m 06s | Hits: 582%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  5m 09s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 29m | Avg:  5m 16s | Max: 20m 24s | Hits: 582%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 51m | Avg:  5m 35s | Max: 20m 24s | Hits: 582%/312   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 08s | Avg:  3m 08s | Max:  3m 08s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 18s | Avg:  3m 18s | Max:  3m 18s
      🟩 Clang18            Pass: 100%/4   | Total: 28m 45s | Avg:  7m 11s | Max: 20m 24s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 11s | Avg:  3m 11s | Max:  3m 11s
      🟩 GCC12              Pass: 100%/2   | Total: 19m 00s | Avg:  9m 30s | Max: 15m 54s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 26s | Avg:  2m 36s | Max:  2m 46s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 12m 06s | Avg: 12m 06s | Max: 12m 06s | Hits: 582%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 54s | Avg: 11m 54s | Max: 11m 54s | Hits: 582%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  5m 09s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 41m 49s | Avg:  5m 13s | Max: 20m 24s
      🟩 GCC                Pass: 100%/8   | Total: 35m 53s | Avg:  4m 29s | Max: 15m 54s
      🟩 MSVC               Pass: 100%/2   | Total: 24m 00s | Avg: 12m 00s | Max: 12m 06s | Hits: 582%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  5m 09s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  1h 51m | Avg:  5m 35s | Max: 20m 24s | Hits: 582%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 15m | Avg:  4m 11s | Max: 12m 06s | Hits: 582%/312   
      🟩 Test               Pass: 100%/2   | Total: 36m 18s | Avg: 18m 09s | Max: 20m 24s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s
      🟩 90a                Pass: 100%/1   | Total:  2m 37s | Avg:  2m 37s | Max:  2m 37s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 12m 45s | Avg:  3m 11s | Max:  5m 02s
      🟩 20                 Pass: 100%/16  | Total:  1h 39m | Avg:  6m 11s | Max: 20m 24s | Hits: 582%/312   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

@caugonnet caugonnet merged commit cda5501 into NVIDIA:main Jan 13, 2025
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stf Sequential Task Flow programming model
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants