Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One broken bookie out of three replicas. #3949

Open
munish1789 opened this issue May 8, 2023 · 0 comments
Open

One broken bookie out of three replicas. #3949

munish1789 opened this issue May 8, 2023 · 0 comments
Labels

Comments

@munish1789
Copy link

BUG REPORT

Describe the bug

One broken bookie out of three replicas due to corrupted journal stored for recovery.

To Reproduce

Steps to reproduce the behavior:
This was observed during a longevity run over more then 48 hours with some load. 1 instance of bookie was not able to recover during pod restart stating posibbly becasue of corrupted bk journal file Opening journal "/bk/journal/j0/current/18795c6b58c.txn"
In the stack trace below it tried to replay ledger 0 from some negative position.

`2023-05-01T00:28:52,129 - INFO - [main:ComponentStarter@84] - Starting component bookie-server.
2023-05-01T00:28:52,132 - INFO - [main:Bookie@995] - Replaying journal 1681845040524 from position 1329111040
2023-05-01T00:28:52,134 - INFO - [main:JournalChannel@157] - Opening journal /bk/journal/j0/current/18795c6b58c.txn
2023-05-01T00:28:52,142 - INFO - [main:EntryLogManagerBase@144] - Creating a new entry log file for ledger '111680' : diskFull = false, allDisksFull = false, reachEntryLogLimit = false, logChannel = null
2023-05-01T00:28:52,156 - INFO - [main:EntryLoggerAllocator@181] - Created new entry log file /bk/ledgers/l3/current/e5d5.log for logId 58837.
2023-05-01T00:28:52,163 - INFO - [main:EntryLogManagerBase@144] - Creating a new entry log file for ledger '109556' : diskFull = false, allDisksFull = false, reachEntryLogLimit = false, logChannel = null
2023-05-01T00:28:52,168 - INFO - [pool-5-thread-1:EntryLoggerAllocator@181] - Created new entry log file /bk/ledgers/l3/current/e5d6.log for logId 58838.
2023-05-01T00:28:52,171 - INFO - [pool-5-thread-1:EntryLoggerAllocator@181] - Created new entry log file /bk/ledgers/l2/current/e5d7.log for logId 58839.
2023-05-01T00:28:52,174 - INFO - [main:EntryLogManagerBase@144] - Creating a new entry log file for ledger '109536' : diskFull = false, allDisksFull = false, reachEntryLogLimit = false, logChannel = null
2023-05-01T00:28:52,176 - INFO - [pool-5-thread-1:EntryLoggerAllocator@181] - Created new entry log file /bk/ledgers/l2/current/e5d8.log for logId 58840.
2023-05-01T00:28:52,177 - INFO - [main:EntryLogManagerBase@144] - Creating a new entry log file for ledger '111708' : diskFull = false, allDisksFull = false, reachEntryLogLimit = false, logChannel = null
2023-05-01T00:28:52,179 - INFO - [pool-5-thread-1:EntryLoggerAllocator@181] - Created new entry log file /bk/ledgers/l0/current/e5d9.log for logId 58841.

2023-05-01T00:28:52,394 - INFO - [main:EntryLogManagerBase@144] - Creating a new entry log file for ledger '0' : diskFull = false, allDisksFull = false, reachEntryLogLimit = false, logChannel = null
2023-05-01T00:28:52,396 - INFO - [pool-5-thread-1:EntryLoggerAllocator@181] - Created new entry log file /bk/ledgers/l0/current/e5da.log for logId 58842.
2023-05-01T00:28:52,395 - ERROR - [main:LedgerEntryPage@202] - IllegalArgumentException when trying to read ledger 0 from position -541506176668401664
java.lang.IllegalArgumentException: Negative position
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:785) ~[?:?]
at org.apache.bookkeeper.bookie.FileInfo.readAbsolute(FileInfo.java:426) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.FileInfo.read(FileInfo.java:396) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.LedgerEntryPage.readPage(LedgerEntryPage.java:196) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.IndexPersistenceMgr.updatePage(IndexPersistenceMgr.java:646) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.IndexInMemPageMgr.grabLedgerEntryPage(IndexInMemPageMgr.java:447) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.IndexInMemPageMgr.getLedgerEntryPage(IndexInMemPageMgr.java:412) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.IndexInMemPageMgr.putEntryOffset(IndexInMemPageMgr.java:571) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:108) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.processEntry(InterleavedLedgerStorage.java:539) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.processEntry(InterleavedLedgerStorage.java:521) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.addEntry(InterleavedLedgerStorage.java:375) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.LedgerDescriptorImpl.addEntry(LedgerDescriptorImpl.java:155) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.Bookie$6.process(Bookie.java:949) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.Journal.scanJournal(Journal.java:840) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.Bookie.replay(Bookie.java:996) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:962) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:1016) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:156) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.server.service.BookieService.doStart(BookieService.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:83) ~[org.apache.bookkeeper-bookkeeper-common-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$4(LifecycleComponentStack.java:144) ~[org.apache.bookkeeper-bookkeeper-common-4.14.3-build-437.jar:4.14.3-build-437]
at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:405) ~[com.google.guava-guava-30.0-jre.jar:?]
at org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:144) ~[org.apache.bookkeeper-bookkeeper-common-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:85) ~[org.apache.bookkeeper-bookkeeper-common-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.server.Main.doMain(Main.java:234) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.server.Main.main(Main.java:208) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
2023-05-01T00:28:52,410 - ERROR - [main:AbstractLifecycleComponent@85] - Failed to start Component: bookie-server
java.lang.IllegalArgumentException: Negative position
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:785) ~[?:?]
at org.apache.bookkeeper.bookie.FileInfo.readAbsolute(FileInfo.java:426) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.FileInfo.read(FileInfo.java:396) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.LedgerEntryPage.readPage(LedgerEntryPage.java:196) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.IndexPersistenceMgr.updatePage(IndexPersistenceMgr.java:646) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.IndexInMemPageMgr.grabLedgerEntryPage(IndexInMemPageMgr.java:447) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.IndexInMemPageMgr.getLedgerEntryPage(IndexInMemPageMgr.java:412) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.IndexInMemPageMgr.putEntryOffset(IndexInMemPageMgr.java:571) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:108) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.processEntry(InterleavedLedgerStorage.java:539) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.processEntry(InterleavedLedgerStorage.java:521) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.addEntry(InterleavedLedgerStorage.java:375) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.LedgerDescriptorImpl.addEntry(LedgerDescriptorImpl.java:155) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.Bookie$6.process(Bookie.java:949) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.Journal.scanJournal(Journal.java:840) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.Bookie.replay(Bookie.java:996) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:962) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:1016) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:156) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.server.service.BookieService.doStart(BookieService.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:83) ~[org.apache.bookkeeper-bookkeeper-common-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$4(LifecycleComponentStack.java:144) ~[org.apache.bookkeeper-bookkeeper-common-4.14.3-build-437.jar:4.14.3-build-437]
at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:405) ~[com.google.guava-guava-30.0-jre.jar:?]
at org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:144) ~[org.apache.bookkeeper-bookkeeper-common-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:85) ~[org.apache.bookkeeper-bookkeeper-common-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.server.Main.doMain(Main.java:234) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
at org.apache.bookkeeper.server.Main.main(Main.java:208) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-437.jar:4.14.3-build-437]
2023-05-01T00:28:52,411 - ERROR - [main:AbstractLifecycleComponent@87] - Calling uncaughtExceptionHandler
2023-05-01T00:28:52,411 - ERROR - [main:ComponentStarter@76] - Triggered exceptionHandler of Component: bookie-server because of Exception in Thread: Thread[main,5,main]`

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots
NAME READY STATUS RESTARTS AGE
nautilus-bookie-0 0/1 Running 1306 (32h ago) 13d
If applicable, add screenshots to help explain your problem.

Additional context

Bookkeeper version used is 4.14.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant