Better cuBLAS handle management #1389

ptrendx · 2025-01-02T23:36:12Z

Description

Create only 1 (per thread) cublasLtHandle in order to not leak memory. We do not want to actually destroy the handle, since it would incur an implicit cudaDeviceSynchronize call (see cuBLAS docs).

Fixes #1372

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactor

Changes

Introduced a singleton manager for cuBLAS handles similar to the one used for cuDNN handles.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Przemek Tredak <[email protected]>

for more information, see https://pre-commit.ci

ksivaman

LGTM

shenzhenghai · 2025-01-04T08:07:07Z

May I have a question about the commit? If there is multiple devices invoke cublas_gemm(), it seems always use the same cublasLt handle?
The document mentioned "one cuBLASLt handle should be created for each device".

shenzhenghai · 2025-01-06T03:59:25Z

May I have a question about the commit? If there is multiple devices invoke cublas_gemm(), it seems always use the same cublasLt handle? The document mentioned "one cuBLASLt handle should be created for each device".

@ptrendx @ksivaman any comment on my concern? Thanks.

ptrendx · 2025-01-06T17:36:03Z

@shenzhenghai Yeah, if your usage is to have multiple devices per thread then you are correct - it would use the same handle, which is wrong. I don't necessarily like the fact that fixing this will require calling cudaGetDevice before every cuBLAS call - this adds overhead and the most typical usage by the DL Frameworks nowadays is 1 GPU per process. But I agree that the functional correctness is the most important - will submit a fix (also will fix the cuDNN handle management, since it has similar problem).

Signed-off-by: Przemek Tredak <[email protected]>

for more information, see https://pre-commit.ci

ptrendx · 2025-01-06T22:52:29Z

/te-ci

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx · 2025-01-07T00:09:09Z

/te-ci

timmoon10

LGTM

ptrendx · 2025-01-07T22:15:44Z

/te-ci

Do not create multiple cublas handle

999879e

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx assigned ksivaman Jan 2, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

7b868b0

for more information, see https://pre-commit.ci

ksivaman approved these changes Jan 3, 2025

View reviewed changes

ptrendx and others added 2 commits January 6, 2025 14:50

Fix for multiple GPUs per thread

529aec2

Signed-off-by: Przemek Tredak <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

44c7d70

for more information, see https://pre-commit.ci

Fix multithreaded execution

e62a128

Signed-off-by: Przemek Tredak <[email protected]>

timmoon10 approved these changes Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better cuBLAS handle management #1389

Better cuBLAS handle management #1389

ptrendx commented Jan 2, 2025 •

edited

Loading

ksivaman left a comment

shenzhenghai commented Jan 4, 2025

shenzhenghai commented Jan 6, 2025

ptrendx commented Jan 6, 2025

ptrendx commented Jan 6, 2025

ptrendx commented Jan 7, 2025

timmoon10 left a comment

ptrendx commented Jan 7, 2025

Better cuBLAS handle management #1389

Are you sure you want to change the base?

Better cuBLAS handle management #1389

Conversation

ptrendx commented Jan 2, 2025 • edited Loading

Description

Type of change

Changes

Checklist:

ksivaman left a comment

Choose a reason for hiding this comment

shenzhenghai commented Jan 4, 2025

shenzhenghai commented Jan 6, 2025

ptrendx commented Jan 6, 2025

ptrendx commented Jan 6, 2025

ptrendx commented Jan 7, 2025

timmoon10 left a comment

Choose a reason for hiding this comment

ptrendx commented Jan 7, 2025

ptrendx commented Jan 2, 2025 •

edited

Loading