Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport to 2.8: some FP8 support #3479

Merged
merged 3 commits into from
Jan 22, 2025

Conversation

bernhardmgruber
Copy link
Contributor

This backports 3 dependent features:

Which is why I created a single backport PR for those.

fbusato and others added 3 commits January 22, 2025 16:52
Also ensure that we actually can enable FP8 due to FP16 and BF16 requirements

Co-authored-by: Michael Schellenberger Costa <[email protected]>
Copy link
Contributor

🟩 CI finished in 2h 18m: Pass: 100%/170 | Total: 3d 13h | Avg: 30m 13s | Max: 1h 22m | Hits: 199%/22580
  • 🟩 libcudacxx: Pass: 100%/48 | Total: 17h 40m | Avg: 22m 05s | Max: 1h 01m | Hits: 324%/9876

    🟩 cpu
      🟩 amd64              Pass: 100%/46  | Total: 16h 56m | Avg: 22m 05s | Max:  1h 01m | Hits: 324%/9876  
      🟩 arm64              Pass: 100%/2   | Total: 43m 35s | Avg: 21m 47s | Max: 21m 49s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total: 53m 24s | Avg:  7m 37s | Max: 36m 03s | Hits: 329%/2286  
      🟩 12.5               Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 37m 03s
      🟩 12.6               Pass: 100%/39  | Total: 15h 33m | Avg: 23m 56s | Max:  1h 01m | Hits: 323%/7590  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 05m | Avg: 16m 26s | Max: 20m 29s
      🟩 nvcc11.1           Pass: 100%/7   | Total: 53m 24s | Avg:  7m 37s | Max: 36m 03s | Hits: 329%/2286  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 37m 03s
      🟩 nvcc12.6           Pass: 100%/35  | Total: 14h 27m | Avg: 24m 47s | Max:  1h 01m | Hits: 323%/7590  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 05m | Avg: 16m 26s | Max: 20m 29s
      🟩 nvcc               Pass: 100%/44  | Total: 16h 34m | Avg: 22m 35s | Max:  1h 01m | Hits: 324%/9876  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 47m 43s | Avg: 11m 55s | Max: 22m 26s
      🟩 Clang10            Pass: 100%/1   | Total: 26m 25s | Avg: 26m 25s | Max: 26m 25s
      🟩 Clang11            Pass: 100%/1   | Total: 23m 50s | Avg: 23m 50s | Max: 23m 50s
      🟩 Clang12            Pass: 100%/1   | Total: 23m 35s | Avg: 23m 35s | Max: 23m 35s
      🟩 Clang13            Pass: 100%/1   | Total: 23m 41s | Avg: 23m 41s | Max: 23m 41s
      🟩 Clang14            Pass: 100%/1   | Total: 22m 04s | Avg: 22m 04s | Max: 22m 04s
      🟩 Clang15            Pass: 100%/1   | Total: 22m 25s | Avg: 22m 25s | Max: 22m 25s
      🟩 Clang16            Pass: 100%/1   | Total: 22m 50s | Avg: 22m 50s | Max: 22m 50s
      🟩 Clang17            Pass: 100%/1   | Total: 23m 37s | Avg: 23m 37s | Max: 23m 37s
      🟩 Clang18            Pass: 100%/8   | Total:  3h 19m | Avg: 24m 58s | Max:  1h 01m
      🟩 GCC6               Pass: 100%/2   | Total:  5m 04s | Avg:  2m 32s | Max:  2m 45s
      🟩 GCC7               Pass: 100%/2   | Total: 33m 18s | Avg: 16m 39s | Max: 17m 07s
      🟩 GCC8               Pass: 100%/1   | Total: 21m 16s | Avg: 21m 16s | Max: 21m 16s
      🟩 GCC9               Pass: 100%/3   | Total: 30m 38s | Avg: 10m 12s | Max: 25m 16s
      🟩 GCC10              Pass: 100%/1   | Total: 23m 33s | Avg: 23m 33s | Max: 23m 33s
      🟩 GCC11              Pass: 100%/1   | Total: 24m 04s | Avg: 24m 04s | Max: 24m 04s
      🟩 GCC12              Pass: 100%/1   | Total: 27m 33s | Avg: 27m 33s | Max: 27m 33s
      🟩 GCC13              Pass: 100%/10  | Total:  3h 24m | Avg: 20m 26s | Max: 30m 31s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 28m 58s | Avg: 28m 58s | Max: 28m 58s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 36m 03s | Avg: 36m 03s | Max: 36m 03s | Hits: 329%/2286  
      🟩 MSVC14.29          Pass: 100%/1   | Total: 37m 48s | Avg: 37m 48s | Max: 37m 48s | Hits: 324%/2481  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 18m | Avg: 39m 08s | Max: 41m 00s | Hits: 322%/5109  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 37m 03s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/20  | Total:  7h 15m | Avg: 21m 47s | Max:  1h 01m
      🟩 GCC                Pass: 100%/21  | Total:  6h 09m | Avg: 17m 36s | Max: 30m 31s
      🟩 Intel              Pass: 100%/1   | Total: 28m 58s | Avg: 28m 58s | Max: 28m 58s
      🟩 MSVC               Pass: 100%/4   | Total:  2h 32m | Avg: 38m 01s | Max: 41m 00s | Hits: 324%/9876  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 37m 03s
    🟩 gpu
      🟩 v100               Pass: 100%/48  | Total: 17h 40m | Avg: 22m 05s | Max:  1h 01m | Hits: 324%/9876  
    🟩 jobs
      🟩 Build              Pass: 100%/41  | Total: 14h 37m | Avg: 21m 24s | Max: 41m 00s | Hits: 324%/9876  
      🟩 NVRTC              Pass: 100%/4   | Total:  1h 43m | Avg: 25m 46s | Max: 30m 31s
      🟩 Test               Pass: 100%/2   | Total:  1h 17m | Avg: 38m 41s | Max:  1h 01m
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 59s | Avg:  1m 59s | Max:  1m 59s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
      🟩 90a                Pass: 100%/2   | Total: 31m 06s | Avg: 15m 33s | Max: 17m 20s
    🟩 std
      🟩 11                 Pass: 100%/6   | Total:  1h 01m | Avg: 10m 11s | Max: 18m 28s
      🟩 14                 Pass: 100%/5   | Total:  1h 47m | Avg: 21m 35s | Max: 36m 03s | Hits: 329%/2286  
      🟩 17                 Pass: 100%/13  | Total:  5h 12m | Avg: 24m 00s | Max: 37m 48s | Hits: 324%/4962  
      🟩 20                 Pass: 100%/23  | Total:  9h 36m | Avg: 25m 04s | Max:  1h 01m | Hits: 321%/2628  
    
  • 🟩 cub: Pass: 100%/47 | Total: 1d 12h | Avg: 46m 52s | Max: 1h 15m | Hits: 27%/3132

    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  1d 10h | Avg: 46m 13s | Max:  1h 15m | Hits:  27%/3132  
      🟩 arm64              Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  1h 30m | Avg: 12m 51s | Max:  1h 03m | Hits:  30%/783   
      🟩 12.5               Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 14m
      🟩 12.6               Pass: 100%/38  | Total:  1d 08h | Avg: 51m 51s | Max:  1h 15m | Hits:  26%/2349  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 10m
      🟩 nvcc11.1           Pass: 100%/7   | Total:  1h 30m | Avg: 12m 51s | Max:  1h 03m | Hits:  30%/783   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 14m
      🟩 nvcc12.6           Pass: 100%/36  | Total:  1d 06h | Avg: 51m 09s | Max:  1h 15m | Hits:  26%/2349  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 10m
      🟩 nvcc               Pass: 100%/45  | Total:  1d 10h | Avg: 46m 05s | Max:  1h 15m | Hits:  27%/3132  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  2h 06m | Avg: 31m 37s | Max:  1h 05m
      🟩 Clang10            Pass: 100%/1   | Total: 54m 43s | Avg: 54m 43s | Max: 54m 43s
      🟩 Clang11            Pass: 100%/1   | Total: 59m 00s | Avg: 59m 00s | Max: 59m 00s
      🟩 Clang12            Pass: 100%/1   | Total: 59m 02s | Avg: 59m 02s | Max: 59m 02s
      🟩 Clang13            Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
      🟩 Clang14            Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m
      🟩 Clang15            Pass: 100%/1   | Total: 57m 22s | Avg: 57m 22s | Max: 57m 22s
      🟩 Clang16            Pass: 100%/1   | Total: 53m 58s | Avg: 53m 58s | Max: 53m 58s
      🟩 Clang17            Pass: 100%/1   | Total: 53m 04s | Avg: 53m 04s | Max: 53m 04s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 38m | Avg: 48m 22s | Max:  1h 10m
      🟩 GCC6               Pass: 100%/2   | Total:  9m 07s | Avg:  4m 33s | Max:  4m 35s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 58m | Avg: 59m 15s | Max:  1h 00m
      🟩 GCC8               Pass: 100%/1   | Total:  1h 05m | Avg:  1h 05m | Max:  1h 05m
      🟩 GCC9               Pass: 100%/3   | Total:  1h 01m | Avg: 20m 22s | Max: 52m 31s
      🟩 GCC10              Pass: 100%/1   | Total: 57m 34s | Avg: 57m 34s | Max: 57m 34s
      🟩 GCC11              Pass: 100%/1   | Total: 56m 46s | Avg: 56m 46s | Max: 56m 46s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 41m | Avg: 33m 54s | Max: 59m 24s
      🟩 GCC13              Pass: 100%/8   | Total:  5h 13m | Avg: 39m 12s | Max:  1h 01m
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m | Hits:  30%/783   
      🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 15m | Avg:  1h 15m | Max:  1h 15m | Hits:  26%/783   
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 13m | Hits:  26%/1566  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 14m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 15h 31m | Avg: 49m 00s | Max:  1h 10m
      🟩 GCC                Pass: 100%/21  | Total: 13h 03m | Avg: 37m 19s | Max:  1h 05m
      🟩 Intel              Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 42m | Avg:  1h 10m | Max:  1h 15m | Hits:  27%/3132  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 14m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 42m 18s | Avg: 21m 09s | Max: 26m 04s
      🟩 v100               Pass: 100%/45  | Total:  1d 12h | Avg: 48m 00s | Max:  1h 15m | Hits:  27%/3132  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  1d 09h | Avg: 50m 48s | Max:  1h 15m | Hits:  27%/3132  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 29m 28s | Avg: 29m 28s | Max: 29m 28s
      🟩 GraphCapture       Pass: 100%/1   | Total: 25m 17s | Avg: 25m 17s | Max: 25m 17s
      🟩 HostLaunch         Pass: 100%/3   | Total: 57m 13s | Avg: 19m 04s | Max: 21m 18s
      🟩 TestGPU            Pass: 100%/2   | Total: 58m 21s | Avg: 29m 10s | Max: 35m 50s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 42m 18s | Avg: 21m 09s | Max: 26m 04s
      🟩 90a                Pass: 100%/1   | Total: 25m 29s | Avg: 25m 29s | Max: 25m 29s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  2h 16m | Avg: 27m 16s | Max:  1h 05m
      🟩 14                 Pass: 100%/4   | Total:  3h 00m | Avg: 45m 09s | Max:  1h 03m | Hits:  30%/783   
      🟩 17                 Pass: 100%/12  | Total: 10h 45m | Avg: 53m 45s | Max:  1h 15m | Hits:  26%/1566  
      🟩 20                 Pass: 100%/26  | Total: 20h 40m | Avg: 47m 43s | Max:  1h 13m | Hits:  27%/783   
    
  • 🟩 thrust: Pass: 100%/46 | Total: 1d 00h | Avg: 31m 51s | Max: 1h 22m | Hits: 128%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 40m 31s | Avg: 20m 15s | Max: 27m 49s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total: 23h 24m | Avg: 31m 55s | Max:  1h 22m | Hits: 128%/9260  
      🟩 arm64              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 29s | Max: 32m 25s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  1h 28m | Avg: 12m 37s | Max:  1h 03m | Hits:  88%/1852  
      🟩 12.5               Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 16m
      🟩 12.6               Pass: 100%/37  | Total: 20h 34m | Avg: 33m 21s | Max:  1h 22m | Hits: 138%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 56m 39s | Avg: 28m 19s | Max: 28m 32s
      🟩 nvcc11.1           Pass: 100%/7   | Total:  1h 28m | Avg: 12m 37s | Max:  1h 03m | Hits:  88%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 16m
      🟩 nvcc12.6           Pass: 100%/35  | Total: 19h 37m | Avg: 33m 38s | Max:  1h 22m | Hits: 138%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 56m 39s | Avg: 28m 19s | Max: 28m 32s
      🟩 nvcc               Pass: 100%/44  | Total: 23h 29m | Avg: 32m 01s | Max:  1h 22m | Hits: 128%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  1h 15m | Avg: 18m 54s | Max: 37m 23s
      🟩 Clang10            Pass: 100%/1   | Total: 34m 13s | Avg: 34m 13s | Max: 34m 13s
      🟩 Clang11            Pass: 100%/1   | Total: 31m 28s | Avg: 31m 28s | Max: 31m 28s
      🟩 Clang12            Pass: 100%/1   | Total: 31m 40s | Avg: 31m 40s | Max: 31m 40s
      🟩 Clang13            Pass: 100%/1   | Total: 37m 20s | Avg: 37m 20s | Max: 37m 20s
      🟩 Clang14            Pass: 100%/1   | Total: 33m 24s | Avg: 33m 24s | Max: 33m 24s
      🟩 Clang15            Pass: 100%/1   | Total: 30m 24s | Avg: 30m 24s | Max: 30m 24s
      🟩 Clang16            Pass: 100%/1   | Total: 34m 08s | Avg: 34m 08s | Max: 34m 08s
      🟩 Clang17            Pass: 100%/1   | Total: 30m 15s | Avg: 30m 15s | Max: 30m 15s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 51m | Avg: 24m 28s | Max: 33m 39s
      🟩 GCC6               Pass: 100%/2   | Total:  8m 12s | Avg:  4m 06s | Max:  4m 10s
      🟩 GCC7               Pass: 100%/2   | Total: 58m 00s | Avg: 29m 00s | Max: 33m 25s
      🟩 GCC8               Pass: 100%/1   | Total: 30m 52s | Avg: 30m 52s | Max: 30m 52s
      🟩 GCC9               Pass: 100%/3   | Total: 40m 10s | Avg: 13m 23s | Max: 31m 51s
      🟩 GCC10              Pass: 100%/1   | Total: 32m 32s | Avg: 32m 32s | Max: 32m 32s
      🟩 GCC11              Pass: 100%/1   | Total: 34m 38s | Avg: 34m 38s | Max: 34m 38s
      🟩 GCC12              Pass: 100%/1   | Total: 34m 21s | Avg: 34m 21s | Max: 34m 21s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 01m | Avg: 22m 40s | Max: 35m 16s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 49m 46s | Avg: 49m 46s | Max: 49m 46s
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m | Hits:  88%/1852  
      🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 22m | Avg:  1h 22m | Max:  1h 22m | Hits:  63%/1852  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 17m | Avg:  1h 05m | Max:  1h 22m | Hits: 164%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 16m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  8h 29m | Avg: 26m 49s | Max: 37m 23s
      🟩 GCC                Pass: 100%/19  | Total:  7h 00m | Avg: 22m 06s | Max: 35m 16s
      🟩 Intel              Pass: 100%/1   | Total: 49m 46s | Avg: 49m 46s | Max: 49m 46s
      🟩 MSVC               Pass: 100%/5   | Total:  5h 42m | Avg:  1h 08m | Max:  1h 22m | Hits: 128%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 16m
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total:  1d 00h | Avg: 31m 51s | Max:  1h 22m | Hits: 128%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total: 22h 55m | Avg: 34m 22s | Max:  1h 22m | Hits:  69%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 52m 55s | Avg: 17m 38s | Max: 36m 55s | Hits: 365%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 37m 27s | Avg: 12m 29s | Max: 13m 20s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 18m 50s | Avg: 18m 50s | Max: 18m 50s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  1h 05m | Avg: 13m 11s | Max: 29m 35s
      🟩 14                 Pass: 100%/4   | Total:  2h 18m | Avg: 34m 32s | Max:  1h 03m | Hits:  88%/1852  
      🟩 17                 Pass: 100%/12  | Total:  8h 01m | Avg: 40m 09s | Max:  1h 22m | Hits:  63%/3704  
      🟩 20                 Pass: 100%/23  | Total: 12h 19m | Avg: 32m 08s | Max:  1h 17m | Hits: 214%/3704  
    
  • 🟩 cudax: Pass: 100%/26 | Total: 6h 15m | Avg: 14m 26s | Max: 19m 49s | Hits: 62%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/22  | Total:  5h 21m | Avg: 14m 36s | Max: 19m 49s | Hits:  62%/312   
      🟩 arm64              Pass: 100%/4   | Total: 54m 04s | Avg: 13m 31s | Max: 14m 37s
    🟩 ctk
      🟩 12.0               Pass: 100%/3   | Total: 40m 49s | Avg: 13m 36s | Max: 15m 00s | Hits:  62%/156   
      🟩 12.5               Pass: 100%/2   | Total: 19m 47s | Avg:  9m 53s | Max: 10m 10s
      🟩 12.6               Pass: 100%/21  | Total:  5h 14m | Avg: 14m 59s | Max: 19m 49s | Hits:  62%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/3   | Total: 40m 49s | Avg: 13m 36s | Max: 15m 00s | Hits:  62%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 47s | Avg:  9m 53s | Max: 10m 10s
      🟩 nvcc12.6           Pass: 100%/21  | Total:  5h 14m | Avg: 14m 59s | Max: 19m 49s | Hits:  62%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/26  | Total:  6h 15m | Avg: 14m 26s | Max: 19m 49s | Hits:  62%/312   
    🟩 cxx
      🟩 Clang9             Pass: 100%/1   | Total: 13m 35s | Avg: 13m 35s | Max: 13m 35s
      🟩 Clang10            Pass: 100%/1   | Total: 14m 34s | Avg: 14m 34s | Max: 14m 34s
      🟩 Clang11            Pass: 100%/1   | Total: 16m 21s | Avg: 16m 21s | Max: 16m 21s
      🟩 Clang12            Pass: 100%/1   | Total: 13m 37s | Avg: 13m 37s | Max: 13m 37s
      🟩 Clang13            Pass: 100%/1   | Total: 14m 39s | Avg: 14m 39s | Max: 14m 39s
      🟩 Clang14            Pass: 100%/1   | Total: 13m 57s | Avg: 13m 57s | Max: 13m 57s
      🟩 Clang15            Pass: 100%/1   | Total: 17m 29s | Avg: 17m 29s | Max: 17m 29s
      🟩 Clang16            Pass: 100%/1   | Total: 16m 43s | Avg: 16m 43s | Max: 16m 43s
      🟩 Clang17            Pass: 100%/1   | Total: 14m 55s | Avg: 14m 55s | Max: 14m 55s
      🟩 Clang18            Pass: 100%/4   | Total: 59m 11s | Avg: 14m 47s | Max: 16m 30s
      🟩 GCC9               Pass: 100%/1   | Total: 15m 00s | Avg: 15m 00s | Max: 15m 00s
      🟩 GCC10              Pass: 100%/1   | Total: 15m 47s | Avg: 15m 47s | Max: 15m 47s
      🟩 GCC11              Pass: 100%/1   | Total: 16m 43s | Avg: 16m 43s | Max: 16m 43s
      🟩 GCC12              Pass: 100%/2   | Total: 38m 37s | Avg: 19m 18s | Max: 19m 49s
      🟩 GCC13              Pass: 100%/4   | Total: 49m 49s | Avg: 12m 27s | Max: 14m 37s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 12m 14s | Avg: 12m 14s | Max: 12m 14s | Hits:  62%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 23s | Avg: 12m 23s | Max: 12m 23s | Hits:  62%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 47s | Avg:  9m 53s | Max: 10m 10s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/13  | Total:  3h 15m | Avg: 15m 00s | Max: 17m 29s
      🟩 GCC                Pass: 100%/9   | Total:  2h 15m | Avg: 15m 06s | Max: 19m 49s
      🟩 MSVC               Pass: 100%/2   | Total: 24m 37s | Avg: 12m 18s | Max: 12m 23s | Hits:  62%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 47s | Avg:  9m 53s | Max: 10m 10s
    🟩 gpu
      🟩 v100               Pass: 100%/26  | Total:  6h 15m | Avg: 14m 26s | Max: 19m 49s | Hits:  62%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/24  | Total:  5h 39m | Avg: 14m 07s | Max: 18m 48s | Hits:  62%/312   
      🟩 Test               Pass: 100%/2   | Total: 36m 19s | Avg: 18m 09s | Max: 19m 49s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  9m 52s | Avg:  9m 52s | Max:  9m 52s
      🟩 90a                Pass: 100%/1   | Total: 12m 23s | Avg: 12m 23s | Max: 12m 23s
    🟩 std
      🟩 17                 Pass: 100%/6   | Total:  1h 13m | Avg: 12m 17s | Max: 15m 00s
      🟩 20                 Pass: 100%/20  | Total:  5h 01m | Avg: 15m 04s | Max: 19m 49s | Hits:  62%/312   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 47s | Avg: 4m 53s | Max: 7m 34s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 47s | Avg:  4m 53s | Max:  7m 34s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 47s | Avg:  4m 53s | Max:  7m 34s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 47s | Avg:  4m 53s | Max:  7m 34s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 47s | Avg:  4m 53s | Max:  7m 34s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 47s | Avg:  4m 53s | Max:  7m 34s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 47s | Avg:  4m 53s | Max:  7m 34s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 47s | Avg:  4m 53s | Max:  7m 34s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 13s | Avg:  2m 13s | Max:  2m 13s
      🟩 Test               Pass: 100%/1   | Total:  7m 34s | Avg:  7m 34s | Max:  7m 34s
    
  • 🟩 python: Pass: 100%/1 | Total: 23m 51s | Avg: 23m 51s | Max: 23m 51s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 23m 51s | Avg: 23m 51s | Max: 23m 51s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 23m 51s | Avg: 23m 51s | Max: 23m 51s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 23m 51s | Avg: 23m 51s | Max: 23m 51s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 23m 51s | Avg: 23m 51s | Max: 23m 51s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 23m 51s | Avg: 23m 51s | Max: 23m 51s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 23m 51s | Avg: 23m 51s | Max: 23m 51s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 23m 51s | Avg: 23m 51s | Max: 23m 51s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 23m 51s | Avg: 23m 51s | Max: 23m 51s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 170)

# Runner
125 linux-amd64-cpu16
19 linux-amd64-gpu-v100-latest-1
15 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@miscco miscco merged commit 51b08b0 into NVIDIA:branch/2.8.x Jan 22, 2025
186 checks passed
@bernhardmgruber bernhardmgruber deleted the backport_fp8_macro branch January 22, 2025 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants