Perf: optimize the stream strategy in module_gint #5845

dzzz2001 · 2025-01-10T07:48:28Z

Background

While testing the LCAO GPU version of abacus on an A800 GPU, I noticed a significant difference in performance when running different commands on a machine with only 16 cores. Specifically, the efficiency of the command OMP_NUM_THREADS=4 mpirun -n 4 differs greatly from that of the command OMP_NUM_THREADS=1 mpirun -n 16. The cal_gint efficiency of the latter can be approximately 8 times slower than the former. Below are the runtime statistics I collected(the test case is si256)：

command	cal_gint_vl	cal_gint_rho	cal_gint_force
omp 4 mpirun 4	15.49	14.39	2.30
omp 1 mpirun 16	114.35	113.6	19.25

After reviewing the code, I discovered that the significant difference in performance might be due to the OpenMP thread setting strategy in the GPU code of module_gint：

From the code, it is evident that the grid integration code sets num_stream parallel threads (where num_stream is typically 4) regardless of whether the system has enough cores. This likely results in the number of threads exceeding the available system cores, leading to a loss in efficiency. Therefore, I modified the thread settings here to address this issue.
Additionally, the stream synchronization strategy in module_gint was previously rather coarse. I have now reset the stream synchronization strategy using CUDA events, which has resulted in some performance gains. After completing all modifications, I re-measured the runtime for the same test cases, with the following results:

command	cal_gint_vl	cal_gint_rho	cal_gint_force
omp 4 mpirun 4	10.99	9.97	3.99
omp 1 mpirun 16	28.60	28.60	9.14

dzzz2001 added 2 commits January 9, 2025 12:21

optimize stream strategy

b3b948c

limit max threads

2acab4b

dzzz2001 requested review from mohanchen and goodchong January 10, 2025 07:50

Merge branch 'develop' into develop

e34e1b4

mohanchen approved these changes Jan 10, 2025

View reviewed changes

mohanchen added GPU & DCU & HPC GPU and DCU and HPC related any issues Refactor Refactor ABACUS codes labels Jan 10, 2025

mohanchen merged commit 16714c6 into deepmodeling:develop Jan 10, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: optimize the stream strategy in module_gint #5845

Perf: optimize the stream strategy in module_gint #5845

dzzz2001 commented Jan 10, 2025 •

edited

Loading

Perf: optimize the stream strategy in module_gint #5845

Perf: optimize the stream strategy in module_gint #5845

Conversation

dzzz2001 commented Jan 10, 2025 • edited Loading

Background

dzzz2001 commented Jan 10, 2025 •

edited

Loading