[fix][Offload] fix indexEntries NullPointerException error #22035

graysonzeng · 2024-02-06T12:21:48Z

Motivation

Offloader exception occurs

21:54:32.360 [offloader-OrderedScheduler-14-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Unknown exception for ManagedLedgerException.
org.apache.bookkeeper.mledger.ManagedLedgerException$OffloadReadHandleClosedException: Offload read handle already closed
21:54:32.424 [offloader-OrderedScheduler-14-0] WARN  org.apache.bookkeeper.common.util.SingleThreadSafeScheduledExecutorService - Unexpected throwable from task class org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl$$Lambda$2013/0x00007fdd98b45d08: Cannot invoke "java.util.TreeMap.clear()" because "this.indexEntries" is null
java.lang.NullPointerException: Cannot invoke "java.util.TreeMap.clear()" because "this.indexEntries" is null
	at org.apache.bookkeeper.mledger.offload.jcloud.impl.OffloadIndexBlockImpl.recycle(OffloadIndexBlockImpl.java:97) ~[?:?]
	at org.apache.bookkeeper.mledger.offload.jcloud.impl.OffloadIndexBlockImpl.close(OffloadIndexBlockImpl.java:358) ~[?:?]
	at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl.lambda$closeAsync$0(BlobStoreBackedReadHandleImpl.java:102) ~[?:?]
	at org.apache.bookkeeper.common.util.SingleThreadSafeScheduledExecutorService$SafeRunnable.run(SingleThreadSafeScheduledExecutorService.java:46) ~[org.apache.bookkeeper-bookkeeper-common-4.16.3.jar:4.16.3]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.104.Final.jar:4.1.104.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

Modifications

check indexEntries is not null

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (10MB)
Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository:

Technoboy-

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

graysonzeng · 2024-02-07T01:23:44Z

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

@Technoboy- thanks. done

lhotari · 2024-02-12T14:11:57Z

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

Technoboy- · 2024-02-13T13:35:48Z

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

maybe

lhotari · 2024-02-18T08:44:45Z

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

maybe

There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.

Technoboy- · 2024-02-20T17:15:23Z

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

maybe

There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.

yes, but for this patch, it's ok to fix it like this, right?

lhotari · 2024-02-20T20:49:34Z

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

maybe

There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.

yes, but for this patch, it's ok to fix it like this, right?

I doubt that it's correct. The problem will become worse if it is "fixed" like this. I think that it is necessary to address the root cause.

lhotari · 2024-02-23T20:54:44Z

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

maybe

There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.

yes, but for this patch, it's ok to fix it like this, right?

@Technoboy- @graysonzeng I have shared more context in #22110 about the "double release" bug pattern.

dao-jun · 2024-02-24T13:40:19Z

it looks like ML read entries from an already closed ledger, read entries from a closed ledger will lead to exception, and then, ML will try to close the ledger again.

I believe the key point is ML trying to read entries from a closed ledger.

dao-jun · 2024-02-24T13:41:13Z

...d/src/main/java/org/apache/bookkeeper/mledger/offload/jcloud/impl/OffloadIndexBlockImpl.java

@@ -94,7 +94,6 @@ public void recycle() {
        dataObjectLength = -1;
        dataHeaderLength = -1;
        segmentMetadata = null;
-        indexEntries.clear();


Remove this line cannot fix the root cause

dao-jun · 2024-02-24T13:45:47Z

@graysonzeng could please provide the steps that I can reproduce the issue？

graysonzeng · 2024-02-28T06:53:57Z

@graysonzeng could please provide the steps that I can reproduce the issue？

This is an occasional error that occurs only once. I can't try to reproduce it @dao-jun

dao-jun · 2024-02-28T10:00:36Z

@graysonzeng what's your pulsar version? could you please provide more log?

dao-jun · 2024-02-29T04:46:44Z

Closing Handle multiple times does indeed occur frequently, because ledger#closeAsync is called by async thread when read entries failed, if there are some ongoing read operations, they will also try to read the ledger which is already closed.

and LedgerHandle has also taken measures to address it(https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L537).
But offloader didn't, I'll create another PR to fix it.

dao-jun · 2024-02-29T05:39:31Z

related PR: #22162

graysonzeng · 2024-02-29T09:29:00Z

Thanks for the fix, I'll close it once the related PR is merged @dao-jun

dao-jun · 2024-02-29T13:05:42Z

the pr closed automatically since #22162 merged.

[fix][Offload] fix indexEntries is null error

113e88b

github-actions bot added doc-required Your PR changes impact docs and you will update later. doc-not-needed Your PR changes do not impact docs and removed doc-required Your PR changes impact docs and you will update later. labels Feb 6, 2024

graysonzeng changed the title ~~[fix][Offload] fix indexEntries is null error~~ [fix][Offload] fix indexEntries NullPointerException error Feb 6, 2024

Technoboy- assigned graysonzeng Feb 6, 2024

Technoboy- added this to the 3.3.0 milestone Feb 6, 2024

Technoboy- reviewed Feb 6, 2024

View reviewed changes

[fix][Offload] remove indexEntries.clear() to aviod NullPointerException

507ada4

Technoboy- added the ready-to-test label Feb 18, 2024

Technoboy- closed this Feb 18, 2024

Technoboy- reopened this Feb 18, 2024

lhotari mentioned this pull request Feb 23, 2024

[improve][misc][WIP] Detect "double release" and "use after release" bugs with recycled objects #22110

Draft

4 tasks

dao-jun requested changes Feb 24, 2024

View reviewed changes

dao-jun mentioned this pull request Feb 29, 2024

[fix][offload] Fix Offload readHandle cannot close multi times. #22162

Merged

15 tasks

dao-jun closed this in #22162 Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][Offload] fix indexEntries NullPointerException error #22035

[fix][Offload] fix indexEntries NullPointerException error #22035

graysonzeng commented Feb 6, 2024 •

edited

Loading

Technoboy- left a comment •

edited

Loading

graysonzeng commented Feb 7, 2024

lhotari commented Feb 12, 2024

Technoboy- commented Feb 13, 2024

lhotari commented Feb 18, 2024

Technoboy- commented Feb 20, 2024

lhotari commented Feb 20, 2024 •

edited

Loading

lhotari commented Feb 23, 2024

dao-jun commented Feb 24, 2024

dao-jun Feb 24, 2024

dao-jun commented Feb 24, 2024

graysonzeng commented Feb 28, 2024

dao-jun commented Feb 28, 2024

dao-jun commented Feb 29, 2024

dao-jun commented Feb 29, 2024

graysonzeng commented Feb 29, 2024 •

edited

Loading

dao-jun commented Feb 29, 2024

[fix][Offload] fix indexEntries NullPointerException error #22035

[fix][Offload] fix indexEntries NullPointerException error #22035

Conversation

graysonzeng commented Feb 6, 2024 • edited Loading

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Matching PR in forked repository

Technoboy- left a comment • edited Loading

Choose a reason for hiding this comment

graysonzeng commented Feb 7, 2024

lhotari commented Feb 12, 2024

Technoboy- commented Feb 13, 2024

lhotari commented Feb 18, 2024

Technoboy- commented Feb 20, 2024

lhotari commented Feb 20, 2024 • edited Loading

lhotari commented Feb 23, 2024

dao-jun commented Feb 24, 2024

dao-jun Feb 24, 2024

Choose a reason for hiding this comment

dao-jun commented Feb 24, 2024

graysonzeng commented Feb 28, 2024

dao-jun commented Feb 28, 2024

dao-jun commented Feb 29, 2024

dao-jun commented Feb 29, 2024

graysonzeng commented Feb 29, 2024 • edited Loading

dao-jun commented Feb 29, 2024

graysonzeng commented Feb 6, 2024 •

edited

Loading

Technoboy- left a comment •

edited

Loading

lhotari commented Feb 20, 2024 •

edited

Loading

graysonzeng commented Feb 29, 2024 •

edited

Loading