From 69ee291b4f17de89000ddeede832588b72a66322 Mon Sep 17 00:00:00 2001 From: dao-jun Date: Fri, 10 Jan 2025 12:25:58 +0800 Subject: [PATCH] Introduce per ledger properties --- ...pip-xyz-Introduce-per-ledger-properties.md | 91 +++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 pip/pip-xyz-Introduce-per-ledger-properties.md diff --git a/pip/pip-xyz-Introduce-per-ledger-properties.md b/pip/pip-xyz-Introduce-per-ledger-properties.md new file mode 100644 index 0000000000000..b5475ca3783b5 --- /dev/null +++ b/pip/pip-xyz-Introduce-per-ledger-properties.md @@ -0,0 +1,91 @@ +# PIP-XXX: Introduce per-ledger properties + +# Background knowledge + +As we don't have a secondary index on the Bookkeeper, so we can't query entries by message metadata efficiently. +The `ManagedCursor` provides a method `asyncFindNewestMatching` to find the newest entry that matches the given +predicate by binary search(See `OpFindNewest.java`). +In https://github.com/apache/pulsar/pull/22792, we optimized `seeking by timestamp` by calculating +the range of ledgers that may contain the target timestamp by `LedgerInfo#timestamp` and we don't need to scan all +ledgers. + +However, when we enabled `AppendIndexMetadataInterceptor` and we want to query entries by `BrokerEntryMetadata#index`, +there is no more efficient way, +we have to scan all ledgers by binary search to find the target entry. + +# Motivation + +Introduce per-ledger properties and we can store the extra per-ledger properties in the `LedgerInfo`, +so we can query entries by `incremental index` more efficiently, say, `BrokerEntryMetadata#index`. + +# Goals + +## In Scope + +* Provide a way to set per-ledger properties, it should be a generic way and can be used by any plugin. + +## Out of Scope + +* Set extra per-ledger properties, it should be done by the specific plugin. Such as `KoP`. + +# High Level Design + +We can store the `incremental index` of the first entry in the ledger into `LedgerInfo#properties`, say, +`BrokerEntryMetadata#index`. +When we want to query entries by `BrokerEntryMetadata#index`, we can calculate the range of ledgers that may contain the +target index by `LedgerInfo#properties` and we don't need to scan all ledgers. + +In https://github.com/apache/pulsar/pull/22792, we provided a new method to find the newest entry with given range of entries: +```java +void asyncFindNewestMatching(FindPositionConstraint constraint, Predicate condition, + Position startPosition, Position endPosition, FindEntryCallback callback, + Object ctx, boolean isFindFromLedger) { + } +``` +We can use this method directly in the above scenario. + +# Detailed Design + +## Public-facing Changes + +### Public API + +* Add the following method in `ManagedLedger`: + +```java + CompletableFuture asyncAddLedgerProperty(long ledgerId, String key, String value); + CompletableFuture asyncRemoveLedgerProperty(long ledgerId, String key); +``` + +### Binary protocol + +* Add a new field `properties` in `LedgerInfo`: + +```protobuf +message LedgerInfo { + required int64 ledgerId = 1; + optional int64 entries = 2; + optional int64 size = 3; + optional int64 timestamp = 4; + optional OffloadContext offloadContext = 5; + // Add the following field + repeated KeyValue properties = 6; +} +``` + +# Backward & Forward Compatibility + +It is fully backward compatible. + +# Alternatives + +None + +# Links + + + +* Mailing List discussion thread: +* Mailing List voting thread: