-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thread to periodically perform pending cache maintenance #2308
base: main
Are you sure you want to change the base?
Add thread to periodically perform pending cache maintenance #2308
Conversation
06852a4
to
8392b1d
Compare
@owenhalpert The changes look good in general. What I'd be interested in is testing under load for a resource constrained system - can you verify if this adds to the latency or impact the performance in any way? We did implement a force evict before writes with #2015. Can you also enable this feature flag and run the above tests to ensure it behaves well? |
558e7ea
to
b10f30f
Compare
@kotwanikunal I've completed the benchmarking on a single node cluster limited to 3GB of memory. Before benchmarking my code, I validated the CacheMaintainer was actually running by inspecting the OpenSearch process status in the Docker container and found that after about a minute, the Results below:
This suggests there is no significant impact on latency with my code changes. I've included the full results of these test runs here: https://gist.github.com/owenhalpert/05ad4f5ae9577f717f2c59f2039d52e4 |
30f53b7
to
dc61f6e
Compare
dc61f6e
to
1ca8dff
Compare
src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java
Outdated
Show resolved
Hide resolved
...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java
Outdated
Show resolved
Hide resolved
1ca8dff
to
85b1782
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments overall looks good
.../org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCacheManager.java
Show resolved
Hide resolved
try { | ||
cache.cleanUp(); | ||
} catch (Exception e) { | ||
logger.error("Error cleaning up cache", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why swallow exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any exceptions from Guava cache operations would otherwise halt our scheduled executor. If an exception occurs here, it would be from Guava's internals rather than our logic, so per Dooyong's suggestion above we can log it and continue scheduling cleanup tasks.
This way we can ensure the cache maintenance keeps running even if individual cleanup attempts fail, and we can monitor Guava errors in the logs.
src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java
Outdated
Show resolved
Hide resolved
|
||
TimeValue interval = KNNSettings.state().getSettingValue(KNNSettings.KNN_CACHE_ITEM_EXPIRY_TIME_MINUTES); | ||
|
||
if (threadPool != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to log an error in else
if its null. I think throwing is an option - @kotwanikunal thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logging makes more sense here since the operations are unaffected without the maintenance thread.
We don't want the OS process to fail because of an auxiliary process.
Also, please move the runnable definition within the if block, given it would not be used otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move the call to startMaintenance into a null check on threadPool
because the entire method has no functionality otherwise.
src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java
Outdated
Show resolved
Hide resolved
...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java
Outdated
Show resolved
Hide resolved
3eb58ba
to
a7e5ff4
Compare
The Github Actions were showing some failures in the Quantization State Cache unit tests, presumably because of concurrent test runs setting/shutting down the thread pool. To address this I added a thread pool that is created and terminated only within the two cache unit test classes to be used for those tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall
src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java
Show resolved
Hide resolved
1a88678
to
ac2500b
Compare
Signed-off-by: owenhalpert <[email protected]>
…UTES Signed-off-by: owenhalpert <[email protected]> Signed-off-by: owenhalpert <[email protected]>
…UTES Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
…ounded thread creation Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
This reverts commit 66829ea. Signed-off-by: owenhalpert <[email protected]>
spotlessApply Signed-off-by: owenhalpert <[email protected]>
20c6e0b
to
71d9759
Compare
Signed-off-by: owenhalpert <[email protected]>
Description
Adds a ScheduledExecutor class that, for each cache, takes in a Runnable with a call to
cleanUp
and periodically executes according to the values ofKNN_CACHE_ITEM_EXPIRY_TIME_MINUTES
andQUANTIZATION_STATE_CACHE_EXPIRY_TIME_MINUTES
. This will perform any pending maintenance (such as evicting expired entries) which was previously only performed when the cache was accessed. The maintenance thread is created whenever a NativeMemoryCache or QuantizationStateCache is instantiated or rebuilt and can be shut down with either class'sclose
method. Relevant logic for cleanup was added to some testing base classes.Related Issues
Resolves #2239
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.