Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][Offload] fix indexEntries NullPointerException error #22035

Closed

Conversation

graysonzeng
Copy link
Contributor

@graysonzeng graysonzeng commented Feb 6, 2024

Motivation

Offloader exception occurs

21:54:32.360 [offloader-OrderedScheduler-14-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Unknown exception for ManagedLedgerException.
org.apache.bookkeeper.mledger.ManagedLedgerException$OffloadReadHandleClosedException: Offload read handle already closed
21:54:32.424 [offloader-OrderedScheduler-14-0] WARN  org.apache.bookkeeper.common.util.SingleThreadSafeScheduledExecutorService - Unexpected throwable from task class org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl$$Lambda$2013/0x00007fdd98b45d08: Cannot invoke "java.util.TreeMap.clear()" because "this.indexEntries" is null
java.lang.NullPointerException: Cannot invoke "java.util.TreeMap.clear()" because "this.indexEntries" is null
	at org.apache.bookkeeper.mledger.offload.jcloud.impl.OffloadIndexBlockImpl.recycle(OffloadIndexBlockImpl.java:97) ~[?:?]
	at org.apache.bookkeeper.mledger.offload.jcloud.impl.OffloadIndexBlockImpl.close(OffloadIndexBlockImpl.java:358) ~[?:?]
	at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl.lambda$closeAsync$0(BlobStoreBackedReadHandleImpl.java:102) ~[?:?]
	at org.apache.bookkeeper.common.util.SingleThreadSafeScheduledExecutorService$SafeRunnable.run(SingleThreadSafeScheduledExecutorService.java:46) ~[org.apache.bookkeeper-bookkeeper-common-4.16.3.jar:4.16.3]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.104.Final.jar:4.1.104.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

Modifications

check indexEntries is not null

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added doc-required Your PR changes impact docs and you will update later. doc-not-needed Your PR changes do not impact docs and removed doc-required Your PR changes impact docs and you will update later. labels Feb 6, 2024
@graysonzeng graysonzeng changed the title [fix][Offload] fix indexEntries is null error [fix][Offload] fix indexEntries NullPointerException error Feb 6, 2024
@Technoboy- Technoboy- added this to the 3.3.0 milestone Feb 6, 2024
Copy link
Contributor

@Technoboy- Technoboy- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

@graysonzeng
Copy link
Contributor Author

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

@Technoboy- thanks. done

@lhotari
Copy link
Member

lhotari commented Feb 12, 2024

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

@Technoboy-
Copy link
Contributor

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

maybe

@lhotari
Copy link
Member

lhotari commented Feb 18, 2024

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

maybe

There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.

@Technoboy-
Copy link
Contributor

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

maybe

There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.

yes, but for this patch, it's ok to fix it like this, right?

@lhotari
Copy link
Member

lhotari commented Feb 20, 2024

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

maybe

There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.

yes, but for this patch, it's ok to fix it like this, right?

I doubt that it's correct. The problem will become worse if it is "fixed" like this. I think that it is necessary to address the root cause.

@lhotari
Copy link
Member

lhotari commented Feb 23, 2024

The close method maybe be called more than once, so cause the issue. I think we can just let indexEntries=null, no need to clear it

Wouldn't that be a problem if the object instance gets recycled multiple times?

maybe

There have been bugs in the past with recycled objects that are caused by releasing the object multiple times.

yes, but for this patch, it's ok to fix it like this, right?

@Technoboy- @graysonzeng I have shared more context in #22110 about the "double release" bug pattern.

@dao-jun
Copy link
Member

dao-jun commented Feb 24, 2024

it looks like ML read entries from an already closed ledger, read entries from a closed ledger will lead to exception, and then, ML will try to close the ledger again.

I believe the key point is ML trying to read entries from a closed ledger.

@@ -94,7 +94,6 @@ public void recycle() {
dataObjectLength = -1;
dataHeaderLength = -1;
segmentMetadata = null;
indexEntries.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line cannot fix the root cause

@dao-jun
Copy link
Member

dao-jun commented Feb 24, 2024

@graysonzeng could please provide the steps that I can reproduce the issue?

@graysonzeng
Copy link
Contributor Author

@graysonzeng could please provide the steps that I can reproduce the issue?

This is an occasional error that occurs only once. I can't try to reproduce it @dao-jun

@dao-jun
Copy link
Member

dao-jun commented Feb 28, 2024

@graysonzeng what's your pulsar version? could you please provide more log?

@dao-jun
Copy link
Member

dao-jun commented Feb 29, 2024

Closing Handle multiple times does indeed occur frequently, because ledger#closeAsync is called by async thread when read entries failed, if there are some ongoing read operations, they will also try to read the ledger which is already closed.

and LedgerHandle has also taken measures to address it(https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L537).
But offloader didn't, I'll create another PR to fix it.

@dao-jun
Copy link
Member

dao-jun commented Feb 29, 2024

related PR: #22162

@graysonzeng
Copy link
Contributor Author

graysonzeng commented Feb 29, 2024

Thanks for the fix, I'll close it once the related PR is merged @dao-jun

@dao-jun
Copy link
Member

dao-jun commented Feb 29, 2024

the pr closed automatically since #22162 merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants