-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix negative position read on BufferedChannel cause data can't read problem. #4021
base: master
Are you sure you want to change the base?
Fix negative position read on BufferedChannel cause data can't read problem. #4021
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very interesting; I'm tracing it now.
I checkout the code to release-4.15.4, but the code in my local does not match your stack. Did you modify some code in your environment? |
@horizonzy hi, this should be the origin stack, the arthas tool may inject some trace. move the line number.
50236
|
I checked the code, the Besides this pr, we can override the |
currentPosition)) <= 0) { | ||
try { | ||
// We don't have it in the buffer, so put necessary data in the buffer | ||
readBufferStartPosition = currentPosition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can check the pos
in advance; if it is negative, throw an exception directly. This pr will invalidate the read buffer cache. Maybe we can reuse the read buffer cache at next time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to method early statement to protect negative pos read.
@@ -243,6 +243,11 @@ public long forceWrite(boolean forceMetadata) throws IOException { | |||
|
|||
@Override | |||
public synchronized int read(ByteBuf dest, long pos, int length) throws IOException { | |||
// protect negative position read | |||
if (pos < 0) { | |||
throw new IllegalArgumentException("Negative position pos:" + pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IllegalArgumentException
is an unchecked exception, can we use IOException
and catch it in the program?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review. my concern on using IOException is that current code base seems not process IOException directly. I didn't check what will happen if the IOException throws outside. For this case I think the IllegalArgumentException
is the same as origin behaviour as this is the same exception throw by jdk.
Descriptions of the changes in this PR:
Motivation
fix #4022
in brief of this change.
When do compaction. worker thread try to extract the ledgerLogger metadata header to get ledger map.
the header to point map offset is -1.
so the read request to BufferedChannel will reject this read and throw
java.lang.IllegalArgumentException: Negative position
but there is a read cache in
BufferedChannel
the -1 is cached in the Channel which will cause following read on this channel fail to get right data.more other background:
our production env bookie can't gc log files.
after add some log we found that
but using bookkeeper tool to read the log it success without throw exception
you can see the pos in BufferedChannel is changed like this which is not very good.
and we found the log header is which offset and count is both -1
see
bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
Lines 1034 to 1047 in fb0d3d6
so the log entry header offset pos is -1.
when -1 pass to
BufferedReadChannel
an
java.lang.IllegalArgumentException: Negative position
throws first time. what is worse is that this position will be cached in BufferedReadChannel see line 92 throw exception but readPosition is not reset, so the position is -1 even after the exception is thrown.bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java
Lines 88 to 99 in fb0d3d6
so when the second time try to execute
extractEntryLogMetadataByScanning
the BufferedReadChannel is cached in ThreadLocal so the same BufferedReadChannel is return again. but this time if will hit this conditionbookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java
Lines 77 to 84 in fb0d3d6
also we saw this with arthas. see readBufferStartPosition the first time is 0 which is when
extractEntryLogMetadataFromIndex
execute first.the second capture you can see this position is -1.
so the third time when execute
extractMetaFromEntryLogs
the position is a very small negivate number.so the wrong data is returned.
still have no idea why the entry log header save -1.
Changes
clean cached read buffer and readPosition when read from BufferedChannel throw exception