Avoid limited memory adaptor issue in balanced KMeans #2570

csadorf · 2025-02-04T21:27:42Z

Switch to the use of get_large_workspace_resource instead of get_workspace_resource
Do not use explicit managed memory allocation.

Based on and merge after #2541 (diff)

…utilities

…to fix-sparse-utilities

…utilities

copy-pr-bot · 2025-02-04T21:27:46Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cjnolet · 2025-02-04T21:30:32Z

@csadorf we will also want to make sure we port this over to cuVS, since the kmeans in raft will be getting ported shortly after GTC.

achirkin

We have couple problem with kmeans_balanced here

cpp/include/raft/cluster/detail/kmeans_balanced.cuh

achirkin · 2025-02-05T08:00:58Z

cpp/include/raft/cluster/detail/kmeans_balanced.cuh

-  rmm::mr::managed_memory_resource managed_memory;
-  rmm::device_async_resource_ref device_memory = resource::get_workspace_resource(handle);
+  rmm::device_async_resource_ref current_device_resource = rmm::mr::get_current_device_resource();
+  rmm::device_async_resource_ref workspace_resource =


This is one of those rare cases where we do indeed need to explicitly allocate rmm::mr::managed_memory_resource (the removed TODO comment is actually incorrect).
The need to use managed memory here has nothing to do with the memory limit and the user choice, but rather is a part of the algorithm. We use the managed_memory variable across this file for not-so-big allocations that are accessed by both device and host (see, for example, the build_fine_clusters function above). Hence, using the device-only memory simply breaks the algorithm here.

Adressed.

However, I'm wondering whether we should be generally using make_managed_vector instead then. @achirkin Was there a specific motivation for the use of rmm::uvectors instead?

No, only historic reasons: the balanced kmeans code arrived earlier than these managed helpers.

Thanks for clarifying.

@cjnolet Considering that this code has moved to cuVS anyways I assume there is no point in refactoring this, is there?

No reason to refactor, just need to fix any issues that is blocking cuML UMAP ATM

This reverts commit 27307ab.

…y resource." This reverts commit be4586d.

copy-pr-bot · 2025-02-05T16:57:23Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

achirkin

Thanks for the extra comment, LGTM

csadorf · 2025-02-05T17:08:28Z

Keeping this in draft mode until #2541 is merged.

csadorf · 2025-02-05T17:15:20Z

@csadorf we will also want to make sure we port this over to cuVS, since the kmeans in raft will be getting ported shortly after GTC.

Prepared in rapidsai/cuvs#659 .

cjnolet

Approving, as this code is deprecated anyways (and will be removed once cuML is using cuVS for this).

cjnolet · 2025-02-05T19:27:53Z

Prepared in rapidsai/cuvs#659 .

Thanks so much @csadorf!

viclafargue and others added 30 commits January 14, 2025 15:04

Fix sparse utilities

2e06ab9

Additionnal fixes

3cd3880

Revert coo_remove_scalar_kernel code

347ea4e

Merge branch 'branch-25.02' into fix-sparse-utilities

f2d6e09

Merge branch 'branch-25.02' into fix-sparse-utilities

c087b50

check style

92e4af1

compilation fix

dc72acc

fix tests

8e58d29

FIX style fixes

c75cd87

changes so far

694d371

Exposing more templates for sparse types

4881313

Adding explicit types ot COO object

ccb3b93

Updating COO object and sparse primitives to include an NNZ type

b831e40

More updates

aed7686

Merge branch 'branch-25.02' into fix-sparse-utilities

786b9ce

completing change

a58ffcb

fixing issues

1d10f72

some updates

2d6f2dc

merge

eb919d1

working through updates for cuvs

cbae315

working through updates for cuvs

485d042

lanczos tests updates

d377a2c

missing ;

46db162

tons of updates to lanczos/eigen

dd39780

Merge branch 'branch-25.02' into fix-sparse-utilities

c4a3497

Merge branch 'branch-25.04' into fix-sparse-utilities

060426c

some fixes for cuml

f17e921

Merge remote-tracking branch 'upstream/branch-25.04' into fix-sparse-…

594080c

…utilities

Merge branch 'fix-sparse-utilities' of github.com:viclafargue/raft in…

04aaf9a

…to fix-sparse-utilities

fixes for cuml

5b30389

divyegala and others added 5 commits February 3, 2025 23:43

more cuml fixes

a444f90

Merge remote-tracking branch 'upstream/branch-25.04' into fix-sparse-…

b1ebf87

…utilities

Use large workspace resource instead of workspace resource.

74a2b6c

Use current device resource instead of explicit managed memory resource.

be4586d

Rename corresponding variable names.

27307ab

github-actions bot added the cpp label Feb 4, 2025

cjnolet requested a review from achirkin February 4, 2025 21:29

achirkin requested changes Feb 5, 2025

View reviewed changes

csadorf changed the base branch from branch-25.04 to pull-request/2541 February 5, 2025 14:31

csadorf added 3 commits February 5, 2025 07:29

Revert "Rename corresponding variable names."

3c3e902

This reverts commit 27307ab.

Revert "Use current device resource instead of explicit managed memor…

d554372

…y resource." This reverts commit be4586d.

Update corresponding inline comment.

847e21e

achirkin approved these changes Feb 5, 2025

View reviewed changes

csadorf changed the title ~~[WIP] Avoid limited memory adaptor issue in balanced KMeans~~ Avoid limited memory adaptor issue in balanced KMeans Feb 5, 2025

cjnolet approved these changes Feb 5, 2025

View reviewed changes

cjnolet changed the base branch from pull-request/2541 to branch-25.04 February 5, 2025 19:26

csadorf mentioned this pull request Feb 5, 2025

Use large workspace resource for balanced kmeans rapidsai/cuvs#659

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid limited memory adaptor issue in balanced KMeans #2570

Avoid limited memory adaptor issue in balanced KMeans #2570

csadorf commented Feb 4, 2025 •

edited

Loading

copy-pr-bot bot commented Feb 4, 2025

cjnolet commented Feb 4, 2025

achirkin left a comment

achirkin Feb 5, 2025

csadorf Feb 5, 2025

achirkin Feb 5, 2025

csadorf Feb 5, 2025

cjnolet Feb 5, 2025

copy-pr-bot bot commented Feb 5, 2025

achirkin left a comment

csadorf commented Feb 5, 2025

csadorf commented Feb 5, 2025

cjnolet left a comment

cjnolet commented Feb 5, 2025

Avoid limited memory adaptor issue in balanced KMeans #2570

Are you sure you want to change the base?

Avoid limited memory adaptor issue in balanced KMeans #2570

Conversation

csadorf commented Feb 4, 2025 • edited Loading

copy-pr-bot bot commented Feb 4, 2025

cjnolet commented Feb 4, 2025

achirkin left a comment

Choose a reason for hiding this comment

achirkin Feb 5, 2025

Choose a reason for hiding this comment

csadorf Feb 5, 2025

Choose a reason for hiding this comment

achirkin Feb 5, 2025

Choose a reason for hiding this comment

csadorf Feb 5, 2025

Choose a reason for hiding this comment

cjnolet Feb 5, 2025

Choose a reason for hiding this comment

copy-pr-bot bot commented Feb 5, 2025

achirkin left a comment

Choose a reason for hiding this comment

csadorf commented Feb 5, 2025

csadorf commented Feb 5, 2025

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Feb 5, 2025

csadorf commented Feb 4, 2025 •

edited

Loading