Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inner Product for CAGRA-Q #458

Open
wants to merge 24 commits into
base: branch-25.02
Choose a base branch
from

Conversation

tarang-jain
Copy link
Contributor

Partially addresses #198 (Cosine still pending for CAGRA and CAGRA-Q).

@github-actions github-actions bot added the cpp label Nov 10, 2024
@tarang-jain tarang-jain added feature request New feature or request non-breaking Introduces a non-breaking change labels Nov 11, 2024
@tarang-jain tarang-jain self-assigned this Nov 11, 2024
@github-actions github-actions bot added the CMake label Nov 12, 2024
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. It's nice to see only minimal changes are needed to add a new metric.
There are (as expected) many new files though. Could you please check the binary size doesn't blow up?

{false},
{true},
{0.6}); // don't demand high recall without refinement
{0.55}); // don't demand high recall without refinement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please justify, why reducing the recall here? Why is it not a bug in build or search functions?

Comment on lines +296 to +297
half2 dist = dist_op<half2, DescriptorT::kMetric>(
q2, c2 + reinterpret_cast<half2(&)[PQ_LEN * vlen / 2]>(vq_vals)[d1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please benchmark this before/after the change at least for: single_cta/multi_cta, itopk in range of 32...512, with couple PQ configs. I'm concerned specifically about a possibility of increased register usage and spilling, which could manifest in a significant drop in QPS in some cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpp/src/neighbors/detail/vpq_dataset.cuh Outdated Show resolved Hide resolved
Copy link

copy-pr-bot bot commented Nov 19, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cjnolet
Copy link
Member

cjnolet commented Nov 20, 2024

/ok to test

@tarang-jain tarang-jain changed the base branch from branch-24.12 to branch-25.02 January 10, 2025 07:52
@tarang-jain tarang-jain marked this pull request as ready for review January 10, 2025 11:35
@tarang-jain tarang-jain requested review from a team as code owners January 10, 2025 11:35
@tarang-jain
Copy link
Contributor Author

tarang-jain commented Jan 11, 2025

Binary size comparison (built from source for all CUDA archs):
While this PR does add several new source files due to the multiple new template instantiations, the binary size of the libcuvs objects does not increase significantly and actually sizes remain the same.
Size of libcuvs.so (branch-25.02): 852M
Size of libcuvs.so (Pull Request): 852M
Size of libcuvs_static.a (branch-25.02): 635M
Size of libcuvs_static.a (branch-25.02): 635M

@cjnolet
Copy link
Member

cjnolet commented Jan 15, 2025

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp feature request New feature or request non-breaking Introduces a non-breaking change Python
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants