Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]RAFT Failure error while running the cuVS Python SQ API #623

Closed
AmeliaYe opened this issue Jan 29, 2025 · 4 comments
Closed

[BUG]RAFT Failure error while running the cuVS Python SQ API #623

AmeliaYe opened this issue Jan 29, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@AmeliaYe
Copy link

Describe the bug
The original embedding before SQ works on cagra, but after SQ, I am encountering this error. Does cagra.build support int8? Also after quantization, the embedding is int8, containing negative number, not sure if it suppose to be int8 or uint8?

Image

Image

Steps/Code to reproduce bug
Link to the notebook with SQ and failed cagra.build: https://gitlab-master.nvidia.com/ameliay/cagra-umap/-/commit/7b0739622aff28c192ed49887c0327ea733390ac#c187ee3d6e010901b2354522adb927344250e1bd_0_291

error message also there.

Expected behavior
cagra.build runs without error so that we can continue to cagra.search and calculate recall score

Environment details (please complete the following information):

  • Environment location: Bare-metal
  • Method of RAFT install: conda
    • If method of install is [Docker], provide docker pull & docker run commands used

Additional context
Add any other context about the problem here.

@AmeliaYe AmeliaYe added the bug Something isn't working label Jan 29, 2025
@AmeliaYe AmeliaYe changed the title [BUG]RAFT failure at file... The metric for NN Descent should be L2Expanded, CosineExpanded or InnerProduct [BUG]RAFT Failure error while running the cuVS Python SQ API Jan 29, 2025
@AmeliaYe
Copy link
Author

AmeliaYe commented Jan 29, 2025

Error Message:

Error processing split 0: RAFT failure at file=/opt/conda/conda-bld/work/cpp/src/neighbors/detail/nn_descent.cuh line=1446: The metric for NN Descent should be L2Expanded, CosineExpanded or InnerProduct
Obtained 40 stack frames
#1 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../.././libcuvs.so(+0x4b2d7d) [0x7f1ba893ad7d]
#2 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../.././libcuvs.so: void cuvs::neighbors::nn_descent::detail::build<signed char, unsigned int, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >(raft::resources const&, cuvs::neighbors::nn_descent::index_params const&, std::experimental::mdspan<signed char const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >, cuvs::neighbors::nn_descent::index&) +0x788 [0x7f1ba9d98118]
#3 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../.././libcuvs.so: cuvs::neighbors::nn_descent::build(raft::resources const&, cuvs::neighbors::nn_descent::index_params const&, std::experimental::mdspan<signed char const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >, std::optional<std::experimental::mdspan<unsigned int, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)0> > >) +0xef [0x7f1ba9d9304f]
#4 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../.././libcuvs.so: cuvs::neighbors::cagra::index<signed char, unsigned int> cuvs::neighbors::cagra::detail::build<signed char, unsigned int, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >(raft::resources const&, cuvs::neighbors::cagra::index_params const&, std::experimental::mdspan<signed char const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >) +0x46c [0x7f1ba94ec82c]
#5 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../.././libcuvs.so: cuvs::neighbors::cagra::build(raft::resources const&, cuvs::neighbors::cagra::index_params const&, std::experimental::mdspan<signed char const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >) +0x21 [0x7f1ba94d98c1]
#6 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../../libcuvs_c.so: cuvsCagraBuild +0xdea [0x7f1c95eb93fa]
#7 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/neighbors/cagra/cagra.cpython-312-x86_64-linux-gnu.so(+0x1c6b8) [0x7f1c8f2976b8]
#8 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/zmq/backend/cython/_zmq.cpython-312-x86_64-linux-gnu.so(+0x12d92) [0x7f1f722add92]
#9 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/common/resources.cpython-312-x86_64-linux-gnu.so(+0xdaee) [0x7f1c96a37aee]
#10 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: _PyObject_MakeTpCall +0x2bb [0x559f3341e75b]
#11 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x1126a1) [0x559f3332c6a1]
#12 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: PyEval_EvalCode +0xa1 [0x559f334d4741]
#13 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2d5ece) [0x559f334efece]
#14 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x112f8e) [0x559f3332cf8e]
#15 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2d099f) [0x559f334ea99f]
#16 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2d1c57) [0x559f334ebc57]
#17 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x113e38) [0x559f3332de38]
#18 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x251adc) [0x559f3346badc]
#19 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2515be) [0x559f3346b5be]
#20 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: _PyObject_Call +0x12b [0x559f3344f1ab]
#21 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x113339) [0x559f3332d339]
#22 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2d099f) [0x559f334ea99f]
#23 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/lib-dynload/_asyncio.cpython-312-x86_64-linux-gnu.so(+0x8274) [0x7f1f71524274]
#24 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/lib-dynload/_asyncio.cpython-312-x86_64-linux-gnu.so(+0x8a63) [0x7f1f71524a63]
#25 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x222fbc) [0x559f3343cfbc]
#26 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x34db0c) [0x559f33567b0c]
#27 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x1c402e) [0x559f333de02e]
#28 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x21940b) [0x559f3343340b]
#29 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x113339) [0x559f3332d339]
#30 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: PyEval_EvalCode +0xa1 [0x559f334d4741]
#31 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2d5ece) [0x559f334efece]
#32 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x21940b) [0x559f3343340b]
#33 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: PyObject_Vectorcall +0x2e [0x559f334331ae]
#34 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x1126a1) [0x559f3332c6a1]
#35 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2eb328) [0x559f33505328]
#36 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: Py_RunMain +0x3d1 [0x559f33504ed1]
#37 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: Py_BytesMain +0x37 [0x559f334bf0c7]
#38 in /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1f72c70d90]
#39 in /lib/x86_64-linux-gnu/libc.so.6: __libc_start_main +0x80 [0x7f1f72c70e40]
#40 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2a4f71) [0x559f334bef71]

@benfred
Copy link
Member

benfred commented Jan 30, 2025

Whats the distance metric you are using? The error message says The metric for NN Descent should be L2Expanded, CosineExpanded or InnerProduct, which makes me think that you might be trying to run this on some other distance metric,.

Do you have a minimum reproducer? The notebook has a bunch of extra code - but running the scalar quantizer with cagra seems to work for me:

import cupy as cp
from cuvs.neighbors import cagra
from cuvs.preprocessing.quantize import scalar

dataset = cp.random.random_sample((1024, 64), dtype=cp.float32)
quantizer = scalar.train(scalar.QuantizerParams(), dataset)
transformed = cp.array(scalar.transform(quantizer, dataset))

index = cagra.build(cagra.IndexParams(), transformed) 
distances, neighbors= cagra.search(cagra.SearchParams(), index, transformed, k=10)

neighbors = neighbors.copy_to_host()
print(neighbors[:10])

@AmeliaYe
Copy link
Author

Hi Ben, I'm getting same error running the code snippet you have here.

Whats the distance metric you are using?
I'm not specifying any of the metric for cagra.build, cagra_index = cagra.build(build_params, vectors_gpu)

Image

@cjnolet
Copy link
Member

cjnolet commented Feb 2, 2025

This issue was fixed offline so I'm going to close it. Please open back up problem persists

@cjnolet cjnolet closed this as completed Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants