Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 fails to prevent invalid wgsizes from launching #29

Open
tom91136 opened this issue Oct 27, 2023 · 1 comment
Open

V2 fails to prevent invalid wgsizes from launching #29

tom91136 opened this issue Oct 27, 2023 · 1 comment
Assignees

Comments

@tom91136
Copy link
Member

If we try to launch the benchmark with an non existent kernel WGSIZE, the program actually gives you an invalid result instead of reporting this and terminating early:

miniBUDE:  
compile_commands:
   - "/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/bin/nvcc -forward-unknown-to-host-compiler -DCUDA -DMEM=MANAGED -DUSE_PPWI="1\\,2\\,4\\,8\\,16\\,32\\,64\\,128" --options-file <OUT>/includes_CUDA.rsp  -std=c++17 -forward-unknown-to-host-compiler -arch=sm_61 -use_fast_math -restrict -keep   -DNDEBUG -std=c++17 -O3 -march=native -x cu -c <SRC>/main.cpp -o <OUT>/src/main.cpp.o"
vcs:
  commit:  e7339d6cd9b832f0ba59ed73d2bc406e4345d495*
  author:  "Tom Lin ([email protected])"
  date:    "2023-10-02 15:21:22 +0100"
  subject: "Prevent NVHPC from optimising away task barrier (likely a bug)"
host_cpu:
  ~
time: { epoch_s:1698373309, formatted: "Fri Oct 27 02:21:49 2023 GMT" }
deck:
  path:         "../data/bm1"
  poses:        65536
  proteins:     938
  ligands:      26
  forcefields:  34
config:
  iterations:   8
  poses:        65536
  ppwi:
    available:  [1,2,4,8,16,32,64,128]
    selected:   [64]
  wgsize:       [512]
device: { index: 0,  name: "NVIDIA TITAN X (Pascal) (12189MB;sm_61)" }
# Device and kernel cc: sm_61
# Verification failed for ppwi=64, wgsize=512; difference exceeded tolerance (0.025%)
# Bad energies (failed/total=58671/65536, showing first 8): 
# index,actual,expected,difference_%
# 0,0,865.523,100
# 1,0,25.0715,100
# 2,0,368.434,100
# 3,0,14.6651,100
# 4,0,574.987,100
# 5,0,707.354,100
# 6,0,33.947,100
# 7,0,135.588,100
# (ppwi=64,wgsize=512,valid=0)
results:
  - outcome:             { valid: false, max_diff_%: 100.000 }
    param:               { ppwi: 64, wgsize: 512 }
    raw_iterations:      [3.50847,0.00114,0.00047,0.00039,0.00041,0.00038,0.00036,0.00037,0.00034,0.00039]
    context_ms:          0.635100
    sum_ms:              0.003
    avg_ms:              0.000
    min_ms:              0.000
    max_ms:              0.000
    stddev_ms:           0.000
    giga_interactions/s: 4111361.976
    gflop/s:             124067012.898
    gfinst/s:            102784049.389
    energies:            
      - 0.00
      - 0.00
      - 0.00
      - 0.00
      - 0.00
      - 0.00
      - 0.00
      - 0.00
best: { min_ms: 0.00, max_ms: 0.00, sum_ms: 0.00, avg_ms: 0.00, ppwi: 64, wgsize: 512 }

We also need to add a hint in the error such that the missing WGSIZE can be added.
Thanks to @jhdavis8 for discovering this.

@tom91136 tom91136 self-assigned this Oct 27, 2023
@tom91136
Copy link
Member Author

Update: it's CUDA's wgsize (propagates to threads per blocks) that's failing, PPWI is the one that's define at compile time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant