[BUG]RAFT Failure error while running the cuVS Python SQ API #623

AmeliaYe · 2025-01-29T02:34:37Z

Describe the bug
The original embedding before SQ works on cagra, but after SQ, I am encountering this error. Does cagra.build support int8? Also after quantization, the embedding is int8, containing negative number, not sure if it suppose to be int8 or uint8?

Steps/Code to reproduce bug
Link to the notebook with SQ and failed cagra.build: https://gitlab-master.nvidia.com/ameliay/cagra-umap/-/commit/7b0739622aff28c192ed49887c0327ea733390ac#c187ee3d6e010901b2354522adb927344250e1bd_0_291

error message also there.

Expected behavior
cagra.build runs without error so that we can continue to cagra.search and calculate recall score

Environment details (please complete the following information):

Environment location: Bare-metal
Method of RAFT install: conda
- If method of install is [Docker], provide docker pull & docker run commands used

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

AmeliaYe · 2025-01-29T03:07:24Z

Error Message:

Error processing split 0: RAFT failure at file=/opt/conda/conda-bld/work/cpp/src/neighbors/detail/nn_descent.cuh line=1446: The metric for NN Descent should be L2Expanded, CosineExpanded or InnerProduct
Obtained 40 stack frames
#1 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../.././libcuvs.so(+0x4b2d7d) [0x7f1ba893ad7d]
#2 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../.././libcuvs.so: void cuvs::neighbors::nn_descent::detail::build<signed char, unsigned int, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >(raft::resources const&, cuvs::neighbors::nn_descent::index_params const&, std::experimental::mdspan<signed char const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >, cuvs::neighbors::nn_descent::index&) +0x788 [0x7f1ba9d98118]
#3 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../.././libcuvs.so: cuvs::neighbors::nn_descent::build(raft::resources const&, cuvs::neighbors::nn_descent::index_params const&, std::experimental::mdspan<signed char const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >, std::optional<std::experimental::mdspan<unsigned int, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)0> > >) +0xef [0x7f1ba9d9304f]
#4 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../.././libcuvs.so: cuvs::neighbors::cagra::index<signed char, unsigned int> cuvs::neighbors::cagra::detail::build<signed char, unsigned int, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >(raft::resources const&, cuvs::neighbors::cagra::index_params const&, std::experimental::mdspan<signed char const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >) +0x46c [0x7f1ba94ec82c]
#5 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../.././libcuvs.so: cuvs::neighbors::cagra::build(raft::resources const&, cuvs::neighbors::cagra::index_params const&, std::experimental::mdspan<signed char const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor, (raft::memory_type)2> >) +0x21 [0x7f1ba94d98c1]
#6 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/preprocessing/quantize/scalar/../../../../../../libcuvs_c.so: cuvsCagraBuild +0xdea [0x7f1c95eb93fa]
#7 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/neighbors/cagra/cagra.cpython-312-x86_64-linux-gnu.so(+0x1c6b8) [0x7f1c8f2976b8]
#8 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/zmq/backend/cython/_zmq.cpython-312-x86_64-linux-gnu.so(+0x12d92) [0x7f1f722add92]
#9 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/site-packages/cuvs/common/resources.cpython-312-x86_64-linux-gnu.so(+0xdaee) [0x7f1c96a37aee]
#10 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: _PyObject_MakeTpCall +0x2bb [0x559f3341e75b]
#11 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x1126a1) [0x559f3332c6a1]
#12 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: PyEval_EvalCode +0xa1 [0x559f334d4741]
#13 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2d5ece) [0x559f334efece]
#14 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x112f8e) [0x559f3332cf8e]
#15 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2d099f) [0x559f334ea99f]
#16 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2d1c57) [0x559f334ebc57]
#17 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x113e38) [0x559f3332de38]
#18 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x251adc) [0x559f3346badc]
#19 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2515be) [0x559f3346b5be]
#20 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: _PyObject_Call +0x12b [0x559f3344f1ab]
#21 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x113339) [0x559f3332d339]
#22 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2d099f) [0x559f334ea99f]
#23 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/lib-dynload/_asyncio.cpython-312-x86_64-linux-gnu.so(+0x8274) [0x7f1f71524274]
#24 in /home/nvidia/miniforge3/envs/cuvs-new/lib/python3.12/lib-dynload/_asyncio.cpython-312-x86_64-linux-gnu.so(+0x8a63) [0x7f1f71524a63]
#25 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x222fbc) [0x559f3343cfbc]
#26 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x34db0c) [0x559f33567b0c]
#27 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x1c402e) [0x559f333de02e]
#28 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x21940b) [0x559f3343340b]
#29 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x113339) [0x559f3332d339]
#30 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: PyEval_EvalCode +0xa1 [0x559f334d4741]
#31 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2d5ece) [0x559f334efece]
#32 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x21940b) [0x559f3343340b]
#33 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: PyObject_Vectorcall +0x2e [0x559f334331ae]
#34 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x1126a1) [0x559f3332c6a1]
#35 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2eb328) [0x559f33505328]
#36 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: Py_RunMain +0x3d1 [0x559f33504ed1]
#37 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python: Py_BytesMain +0x37 [0x559f334bf0c7]
#38 in /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1f72c70d90]
#39 in /lib/x86_64-linux-gnu/libc.so.6: __libc_start_main +0x80 [0x7f1f72c70e40]
#40 in /home/nvidia/miniforge3/envs/cuvs-new/bin/python(+0x2a4f71) [0x559f334bef71]

benfred · 2025-01-30T19:10:04Z

Whats the distance metric you are using? The error message says The metric for NN Descent should be L2Expanded, CosineExpanded or InnerProduct, which makes me think that you might be trying to run this on some other distance metric,.

Do you have a minimum reproducer? The notebook has a bunch of extra code - but running the scalar quantizer with cagra seems to work for me:

import cupy as cp
from cuvs.neighbors import cagra
from cuvs.preprocessing.quantize import scalar

dataset = cp.random.random_sample((1024, 64), dtype=cp.float32)
quantizer = scalar.train(scalar.QuantizerParams(), dataset)
transformed = cp.array(scalar.transform(quantizer, dataset))

index = cagra.build(cagra.IndexParams(), transformed) 
distances, neighbors= cagra.search(cagra.SearchParams(), index, transformed, k=10)

neighbors = neighbors.copy_to_host()
print(neighbors[:10])

AmeliaYe · 2025-01-30T19:35:08Z

Hi Ben, I'm getting same error running the code snippet you have here.

Whats the distance metric you are using?
I'm not specifying any of the metric for cagra.build, cagra_index = cagra.build(build_params, vectors_gpu)

cjnolet · 2025-02-02T16:56:01Z

This issue was fixed offline so I'm going to close it. Please open back up problem persists

AmeliaYe added the bug Something isn't working label Jan 29, 2025

AmeliaYe changed the title ~~[BUG]RAFT failure at file... The metric for NN Descent should be L2Expanded, CosineExpanded or InnerProduct~~ [BUG]RAFT Failure error while running the cuVS Python SQ API Jan 29, 2025

cjnolet closed this as completed Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]RAFT Failure error while running the cuVS Python SQ API #623

[BUG]RAFT Failure error while running the cuVS Python SQ API #623

AmeliaYe commented Jan 29, 2025

AmeliaYe commented Jan 29, 2025 •

edited

Loading

benfred commented Jan 30, 2025

AmeliaYe commented Jan 30, 2025

cjnolet commented Feb 2, 2025

[BUG]RAFT Failure error while running the cuVS Python SQ API #623

[BUG]RAFT Failure error while running the cuVS Python SQ API #623

Comments

AmeliaYe commented Jan 29, 2025

AmeliaYe commented Jan 29, 2025 • edited Loading

benfred commented Jan 30, 2025

AmeliaYe commented Jan 30, 2025

cjnolet commented Feb 2, 2025

AmeliaYe commented Jan 29, 2025 •

edited

Loading