Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associate cluster label with centroid #19

Closed
wants to merge 2 commits into from

Conversation

vihu
Copy link
Contributor

@vihu vihu commented Sep 10, 2024

Summary

It seems to me that there was currently no way to get the centroid of a cluster associated with its label. I need to know this for some external usage.

This patch makes it so that the center now also returns the associated cluster label making it easier to idenfity each cluster's centroid.

Summary
----
It seems to me that there was currently no way to get the centroid of a
cluster associated with its label. I need to know this for some external
usage.

This patch makes it so that the center now also returns the associated
cluster label making it easier to idenfity each cluster's centroid.
Copy link
Owner

@tom-whitehead tom-whitehead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @vihu, I can't merge this PR because it changes the library's API and therefore represents breaking changes that would need to be a major version bump.

I've double checked the existing centroids calculation and it looks fine to me. I just manually calculated the centroids on some test data to verify this. The centroid vector returned is in the order of the labels. For example if you want to get the centroid of the cluster with label 0 you can just do:

let centers = clusterer.calc_centers(Center::Centroid, &labels).unwrap();
let centroid = &centroids[0]

I took this from the Python HDBSCAN implementation. Let me know if it's not clear.

I will update the docstring when I get a chance to publish a new version to make it more obvious.

@vihu
Copy link
Contributor Author

vihu commented Sep 10, 2024

Hey @vihu, I can't merge this PR because it changes the library's API and therefore represents breaking changes that would need to be a major version bump.

I've double checked the existing centroids calculation and it looks fine to me. I just manually calculated the centroids on some test data to verify this. The centroid vector returned is in the order of the labels. For example if you want to get the centroid of the cluster with label 0 you can just do:

let centers = clusterer.calc_centers(Center::Centroid, &labels).unwrap();
let centroid = &centroids[0]

I took this from the Python HDBSCAN implementation. Let me know if it's not clear.

I will update the docstring when I get a chance to publish a new version to make it more obvious.

Ah I didn't realize, that makes sense. Thanks for the information! I'll close this PR since I can just lookup the centroids using the cluster label as index.

@vihu vihu closed this Sep 10, 2024
@vihu vihu deleted the rg/centroid-labels branch September 10, 2024 19:49
@vihu vihu restored the rg/centroid-labels branch September 10, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants