Concurrency optimization for native graph loading #2345

Gankris96 · 2024-12-19T01:22:49Z

Description

Refactors the graph load into a 2 step approach detailed here - #2265 (comment)

This will help to move out the opening of indexInput file outside of the synchronized block so that the graphfile can be downloaded in parallel even if the graph load and createIndexAllocation are inside synchronized block.

Related Issues

Resolves #2265

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v · 2024-12-19T18:56:09Z

Please add an entry in the changelog.

src/main/java/org/opensearch/knn/index/memory/NativeMemoryEntryContext.java

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java

src/main/java/org/opensearch/knn/index/memory/NativeMemoryEntryContext.java

0ctopus13prime · 2024-12-20T02:38:53Z

HI @Gankris96, thank you for the PR.
I see this clearly will bring benefit for the cases where multiple threads are competing each other.
But just curious, after this fix, how much performance gain do you see?

Gankris96 · 2024-12-20T03:01:42Z

HI @Gankris96, thank you for the PR. I see this clearly will bring benefit for the cases where multiple threads are competing each other. But just curious, after this fix, how much performance gain do you see?

Hi @0ctopus13prime - yes i am working on getting the benchmarking numbers. Primarily trying to test it on remote store backed index to see the performance gains.

Will update with Benchmarking numbers soon.

navneet1v · 2024-12-27T17:54:45Z

@Gankris96 please fix the failing CIs.

src/main/java/org/opensearch/knn/index/memory/NativeMemoryEntryContext.java

navneet1v · 2025-01-09T07:25:33Z

The searchable snapshot case shows improvement @navneet1v

Great this is what was expected.

I think the first query shows the improvement:

Query #	Without Fix	With Fix	Delta (ms)	Improvement
Query 1	3127.25	108.58	-3018.67	-96.5%

because that is time when all the graph files will be downloaded from remote, and once it is downloaded then the gains will drop and this fix was actually for the first query or for the queries for which the graph files are evicted from disk.

Vikasht34

Could we make our unit and integ test more robust with code changes we are doing

for cases isIndexGraphFileOpened() Ensure that openVectorIndex does nothing if the index graph file is already opened.
2.Verify that the method extracts the vector file name correctly and proceeds to load the index without errors.
3.Pass an invalid cache key that does not contain a vector file name and verify that the method throws an IllegalStateException with the correct error message.
4.Can we mock directory.openInput method to return a valid IndexInput and verify that readStream and indexInputWithBuffer are initialized correctly.
Can we veify readStream.seek(0) is called successfully.

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java

Vikasht34 · 2025-01-09T17:00:16Z

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java

@@ -350,7 +352,11 @@ public NativeMemoryAllocation get(NativeMemoryEntryContext<?> nativeMemoryEntryC
                return result;
            }
        } else {
-            return cache.get(nativeMemoryEntryContext.getKey(), nativeMemoryEntryContext::load);
+            // open graphFile before load
+            try (nativeMemoryEntryContext) {


There could be a case where Multiple threads trigger eviction and graph loading concurrently, leading to temporary spikes in memory usage. Can we think of using bounded concurrency for eviction and graph loading tasks with thread pools?

will take it up in a separate issue

This is a fair callout. I think we need to improve on our cache operations in general.
I think the problem we are going through right now is that the cache operations can be async in nature (cleanup, eviction) where as we use it as a 1:1 reference for the off heap memory in use.
We can create a tracking issue and deal with this separately.

Vikasht34 · 2025-01-09T17:02:21Z

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java

-            return cache.get(nativeMemoryEntryContext.getKey(), nativeMemoryEntryContext::load);
+            // open graphFile before load
+            try (nativeMemoryEntryContext) {
+                nativeMemoryEntryContext.openVectorIndex();


Can we avoid this case when graph is partially loaded or an error occurs during loading, which endup cache being an inconsistent state . Can we ensure automaticity in graph loading and only put in cache if it is successful.

if there is error in graph loading then the entry will not be in cache. What would be the scenario where cache ends up in inconsistent state ?

@Gankris96 Can we wrap this call behind the same lock based logic above?
Just to make sure we do not open the same index files concurrently in two different threads?

Wrapping this within a lock still seems to fail some bwc search tests where we endup getting incorrect results. Even doing so would not really help coz we don't solve the eventual problem of multiple graph files getting loaded at the same time because the load is not synchronized anymore.
This probably requires revisiting in a new separate issue where we refactor the whole cache strategy imo.

Please create an issue so that we can track it.

I guess the bwc failure was a different issue unrelated to this. I did add back the locking logic for this as well. It seems to work fine so we can keep this in.

navneet1v · 2025-01-13T19:07:57Z

@Gankris96 can you please fix the CIs

src/main/java/org/opensearch/knn/index/memory/NativeMemoryEntryContext.java

Gankris96 · 2025-01-17T02:58:13Z

@Vikasht34 @kotwanikunal @navneet1v have updated with some additional locking around openVectorIndex and added some UTs around the same. Please take a look

Vikasht34

Thanks for addressing the comments , Looks good to me.

kotwanikunal · 2025-01-21T23:20:50Z

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java

-            return cache.get(nativeMemoryEntryContext.getKey(), nativeMemoryEntryContext::load);
+            // open graphFile before load
+            try (nativeMemoryEntryContext) {
+                nativeMemoryEntryContext.openVectorIndex();


Please create an issue so that we can track it.

Signed-off-by: Ganesh Ramadurai <[email protected]>

Gankris96 · 2025-01-23T19:21:37Z

@navneet1v @jmazanec15 please take a look and approve if all looks good.

shatejas

Looks good overall. Some comments related to maintenance of code

shatejas · 2025-01-24T17:36:52Z

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java

+            ReentrantLock indexFileLock = indexLocks.computeIfAbsent(key, k -> new ReentrantLock());
+            indexFileLock.lock();
+            nativeMemoryEntryContext.openVectorIndex();
+            indexFileLock.unlock();


Can we please, have a private method openIndex() here so this is taken care as an when the code changes?

shatejas · 2025-01-24T17:39:02Z

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java

+                // recheck if another thread already loaded this entry into the cache
+                result = cache.getIfPresent(key);
+                if (result != null) {
+                    accessRecencyQueue.remove(key);
+                    accessRecencyQueue.addLast(key);
+                    return result;
+                }


private method for this as well? There will be additional null check but everytime get returns accessRecency should be updated.

navneet1v · 2025-01-24T17:54:05Z

Overall looks good to me. I think I agree with @shatejas comments. Please resolve them so that we can ship this change

Gankris96 force-pushed the concurrent-graph-load branch from 7cb8710 to 8e90b88 Compare December 19, 2024 01:30

Gankris96 changed the title ~~Concurrency optimization for graph native loading~~ Concurrency optimization for native graph loading Dec 19, 2024

0ctopus13prime reviewed Dec 20, 2024

View reviewed changes

Gankris96 marked this pull request as ready for review December 20, 2024 03:11

Gankris96 requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski, ryanbogan, luyuncheng and shatejas as code owners December 20, 2024 03:11

Gankris96 force-pushed the concurrent-graph-load branch 6 times, most recently from f981b83 to 79392bc Compare December 31, 2024 01:15

shatejas reviewed Dec 31, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/memory/NativeMemoryEntryContext.java Outdated Show resolved Hide resolved

kotwanikunal reviewed Jan 2, 2025

View reviewed changes

src/main/java/org/opensearch/knn/index/memory/NativeMemoryEntryContext.java Outdated Show resolved Hide resolved

src/main/java/org/opensearch/knn/index/memory/NativeMemoryEntryContext.java Outdated Show resolved Hide resolved

Gankris96 force-pushed the concurrent-graph-load branch 2 times, most recently from 0c2b587 to 0121bdc Compare January 3, 2025 01:50

navneet1v mentioned this pull request Jan 3, 2025

[GitHub Request] Add @0ctopus13prime as the maintainer for the k-NN repo opensearch-project/.github#257

Closed

Gankris96 force-pushed the concurrent-graph-load branch from 33afd58 to ecdb8fa Compare January 9, 2025 01:05

navneet1v reviewed Jan 9, 2025

View reviewed changes

src/main/java/org/opensearch/knn/index/memory/NativeMemoryEntryContext.java Outdated Show resolved Hide resolved

Vikasht34 reviewed Jan 9, 2025

View reviewed changes

navneet1v previously approved these changes Jan 13, 2025

View reviewed changes

navneet1v self-requested a review January 13, 2025 19:07

Gankris96 commented Jan 14, 2025

View reviewed changes

src/main/java/org/opensearch/knn/index/memory/NativeMemoryEntryContext.java Show resolved Hide resolved

Gankris96 dismissed navneet1v’s stale review via 68c067b January 17, 2025 02:18

Gankris96 force-pushed the concurrent-graph-load branch 2 times, most recently from 68c067b to daae55d Compare January 17, 2025 02:39

Vikasht34 approved these changes Jan 21, 2025

View reviewed changes

Gankris96 force-pushed the concurrent-graph-load branch 2 times, most recently from 9edec4e to 2d90b20 Compare January 21, 2025 22:15

kotwanikunal approved these changes Jan 21, 2025

View reviewed changes

Gankris96 force-pushed the concurrent-graph-load branch 7 times, most recently from d8e017a to 4fdd28e Compare January 22, 2025 22:58

Concurrency optimization for graph native loading update

4dc8449

Signed-off-by: Ganesh Ramadurai <[email protected]>

Gankris96 force-pushed the concurrent-graph-load branch from 4fdd28e to 4dc8449 Compare January 23, 2025 03:51

kotwanikunal approved these changes Jan 23, 2025

View reviewed changes

jmazanec15 approved these changes Jan 24, 2025

View reviewed changes

shatejas reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency optimization for native graph loading #2345

Concurrency optimization for native graph loading #2345

Gankris96 commented Dec 19, 2024 •

edited

Loading

navneet1v commented Dec 19, 2024

0ctopus13prime commented Dec 20, 2024

Gankris96 commented Dec 20, 2024 •

edited

Loading

navneet1v commented Dec 27, 2024

navneet1v commented Jan 9, 2025 •

edited

Loading

Vikasht34 left a comment

Vikasht34 Jan 9, 2025

Gankris96 Jan 13, 2025

kotwanikunal Jan 18, 2025

Vikasht34 Jan 9, 2025

Gankris96 Jan 13, 2025

kotwanikunal Jan 18, 2025

Gankris96 Jan 21, 2025

kotwanikunal Jan 21, 2025

Gankris96 Jan 23, 2025

navneet1v commented Jan 13, 2025

Gankris96 commented Jan 17, 2025

Vikasht34 left a comment

kotwanikunal Jan 21, 2025

Gankris96 commented Jan 23, 2025

shatejas left a comment

shatejas Jan 24, 2025

shatejas Jan 24, 2025

navneet1v commented Jan 24, 2025

Concurrency optimization for native graph loading #2345

Are you sure you want to change the base?

Concurrency optimization for native graph loading #2345

Conversation

Gankris96 commented Dec 19, 2024 • edited Loading

Description

Related Issues

Check List

navneet1v commented Dec 19, 2024

0ctopus13prime commented Dec 20, 2024

Gankris96 commented Dec 20, 2024 • edited Loading

navneet1v commented Dec 27, 2024

navneet1v commented Jan 9, 2025 • edited Loading

Vikasht34 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

navneet1v commented Jan 13, 2025

Gankris96 commented Jan 17, 2025

Vikasht34 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gankris96 commented Jan 23, 2025

shatejas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

navneet1v commented Jan 24, 2025

Gankris96 commented Dec 19, 2024 •

edited

Loading

Gankris96 commented Dec 20, 2024 •

edited

Loading

navneet1v commented Jan 9, 2025 •

edited

Loading