Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix CAGRA graph optimization bug #565

Merged

Conversation

enp1s0
Copy link
Member

@enp1s0 enp1s0 commented Jan 11, 2025

The merging process of the pruned and revedge graphs is inappropriate, resulting in neighbor index duplication. This PR fixes the issue and adds the validation of graph indices at the end of the graph::optimze function.

The simplest solution to fix this issue is 1151511, but it ignores the memory space optimization by rapidsai/raft@12480cf. The code changes in this PR apply the memory space optimization to the simplest solution to remove the memory space for the pruned graph (pruned_graph).

This PR also fixes a problem that throwing an exception from an OMP loop does not work as expected.

This is the search performance comparison between the graph optimization that has a bug (branch-25.02) and fixed (fix-cagra-graph-optimization-bug). Each point represents the search performance for itopk=32, 64, ... 512. By this PR, the recall becomes slightly higher when searching with a small itopk size. Because the number of duplicated nodes by this bug is not so large (typically less than 100 in a 1M dataset), the performances are almost the same when searching with a large itopk size and traversing the graph sufficiently.
fix-cagra-graph-optimization-bug

@enp1s0 enp1s0 requested a review from a team as a code owner January 11, 2025 06:51
@enp1s0 enp1s0 self-assigned this Jan 11, 2025
@github-actions github-actions bot added the cpp label Jan 11, 2025
@enp1s0 enp1s0 added bug Something isn't working non-breaking Introduces a non-breaking change and removed cpp labels Jan 11, 2025
@enp1s0 enp1s0 changed the title Fix CAGRA graph optimization bug [BUG] Fix CAGRA graph optimization bug Jan 11, 2025
@github-actions github-actions bot added the cpp label Jan 11, 2025
@enp1s0
Copy link
Member Author

enp1s0 commented Jan 12, 2025

The new optimize function gets stuck in cagra_extreme_inputs_oob_test because there are some invalid index (~0u) nodes in the initial knn graph. It would be better to check the invalid nodes first in the function and abort the process if there are any.

It seems too severe to abort the optimization process if there is even one invalid node, so I updated the code not to abort unless the pruned graph can be generated even if there are invalid nodes in the initial kNN graph.

@enp1s0
Copy link
Member Author

enp1s0 commented Jan 13, 2025

@anaruse Can you review the update? It is related to merging the pruned graph and MST optimization edges.

@cjnolet
Copy link
Member

cjnolet commented Jan 16, 2025

Approving, but waiting to merge until you confirm it's ready @enp1s0

@enp1s0
Copy link
Member Author

enp1s0 commented Jan 17, 2025

@cjnolet Thank you for approving. I think it's ready to be merged. Akira and I found a problem with this fix. Please wait a moment.

@enp1s0
Copy link
Member Author

enp1s0 commented Jan 17, 2025

@anaruse Can you review the code again?

@anaruse
Copy link
Contributor

anaruse commented Jan 20, 2025

@anaruse Can you review the code again?

It looks good to me.

rapids-bot bot pushed a commit that referenced this pull request Jan 23, 2025
… params (#569)

This PR updates the default chunk size of the CAGRA graph extension and also adds a knob to control the batch size of the CAGRA searches run inside for better throughput.

The default chunk size was set to 1 in the current implementation because there is a potential problem with low recall when the chunk size is large, because no edges are made within nodes in the same chunk. However, as I have investigated, the low recall problem rarely occurs with large chunk sizes.

# Search performance

The performance was measured after applying a bugfix #565

## degree = 32


![extend-ir0 9-degree32](https://github.com/user-attachments/assets/a5bb2fb6-8c12-49ad-b96a-1b384d79a96b)


(I don't know the reason the performance is unstable in NYTimes.)

## degree = 64
![extend-ir0 9-degree64](https://github.com/user-attachments/assets/8e926e1c-d772-4682-9419-9cc027f09d3f)

So I increase the default chunk size to the size of the new dataset vectors for better throughput in this PR. I also make public a knob to control the search batch size in the `extend' function to control the balance between throughput and memory consumption.

Authors:
  - tsuki (https://github.com/enp1s0)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #569
@cjnolet
Copy link
Member

cjnolet commented Jan 24, 2025

/merge

@rapids-bot rapids-bot bot merged commit dc00f80 into rapidsai:branch-25.02 Jan 24, 2025
61 checks passed
@enp1s0 enp1s0 deleted the fix-cagra-graph-optimization-bug branch January 24, 2025 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cpp non-breaking Introduces a non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants