This project is forked from lj_simd
$ cd cuda
$ make
$ cd openacc
$ make
$ cd opencl
$ make
- OpenACC (tuned)
- The optimal Verlet list data layout for GPUs is used.
- OpenACC (naive)
- pragma directives are simply added to original CPU source codes.
- icpc version 18.0.0 20170811
implementation | time [s] |
---|---|
Reference | 1.431335 |
AVX2 SIMD | 0.877171 |
- CUDA version 7.5
- PGI compiler version 16.10
implementation | time [s] |
---|---|
CUDA | 0.049346 |
OpenACC (tuned) | 0.168751 |
OpenACC (naive) | 0.305789 |
- CUDA version 8.0
- PGI compiler version 17.1
implementation | time [s] |
---|---|
CUDA | 0.017529 |
OpenACC (tuned) | 0.027165 |
OpenACC (naive) | 0.092830 |