Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid limited memory adaptor issue in balanced KMeans #2570

Draft
wants to merge 38 commits into
base: branch-25.04
Choose a base branch
from

Conversation

csadorf
Copy link
Contributor

@csadorf csadorf commented Feb 4, 2025

  • Switch to the use of get_large_workspace_resource instead of get_workspace_resource
  • Do not use explicit managed memory allocation.

Based on and merge after #2541 (diff)

Copy link

copy-pr-bot bot commented Feb 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Feb 4, 2025
@cjnolet cjnolet requested a review from achirkin February 4, 2025 21:29
@cjnolet
Copy link
Member

cjnolet commented Feb 4, 2025

@csadorf we will also want to make sure we port this over to cuVS, since the kmeans in raft will be getting ported shortly after GTC.

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have couple problem with kmeans_balanced here

cpp/include/raft/cluster/detail/kmeans_balanced.cuh Outdated Show resolved Hide resolved
rmm::mr::managed_memory_resource managed_memory;
rmm::device_async_resource_ref device_memory = resource::get_workspace_resource(handle);
rmm::device_async_resource_ref current_device_resource = rmm::mr::get_current_device_resource();
rmm::device_async_resource_ref workspace_resource =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those rare cases where we do indeed need to explicitly allocate rmm::mr::managed_memory_resource (the removed TODO comment is actually incorrect).
The need to use managed memory here has nothing to do with the memory limit and the user choice, but rather is a part of the algorithm. We use the managed_memory variable across this file for not-so-big allocations that are accessed by both device and host (see, for example, the build_fine_clusters function above). Hence, using the device-only memory simply breaks the algorithm here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adressed.

However, I'm wondering whether we should be generally using make_managed_vector instead then. @achirkin Was there a specific motivation for the use of rmm::uvectors instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only historic reasons: the balanced kmeans code arrived earlier than these managed helpers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying.

@cjnolet Considering that this code has moved to cuVS anyways I assume there is no point in refactoring this, is there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason to refactor, just need to fix any issues that is blocking cuML UMAP ATM

@csadorf csadorf changed the base branch from branch-25.04 to pull-request/2541 February 5, 2025 14:31
Copy link

copy-pr-bot bot commented Feb 5, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra comment, LGTM

@csadorf csadorf changed the title [WIP] Avoid limited memory adaptor issue in balanced KMeans Avoid limited memory adaptor issue in balanced KMeans Feb 5, 2025
@csadorf
Copy link
Contributor Author

csadorf commented Feb 5, 2025

Keeping this in draft mode until #2541 is merged.

@csadorf
Copy link
Contributor Author

csadorf commented Feb 5, 2025

@csadorf we will also want to make sure we port this over to cuVS, since the kmeans in raft will be getting ported shortly after GTC.

Prepared in rapidsai/cuvs#659 .

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, as this code is deprecated anyways (and will be removed once cuML is using cuVS for this).

@cjnolet cjnolet changed the base branch from pull-request/2541 to branch-25.04 February 5, 2025 19:26
@cjnolet
Copy link
Member

cjnolet commented Feb 5, 2025

Prepared in rapidsai/cuvs#659 .

Thanks so much @csadorf!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants