Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Garbage collection stuck on corrupt entry log file #4544

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dlg99
Copy link
Contributor

@dlg99 dlg99 commented Jan 8, 2025

Motivation

corrupt entry log file causes OODMEs and stuck GC.

Log (from bk 4.15.x):

2025-01-03T00:24:55,037+0000 [GarbageCollectorThread-7-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 45795
2025-01-03T00:24:55,038+0000 [GarbageCollectorThread-7-1] INFO  org.apache.bookkeeper.bookie.EntryLogger - Failed to get ledgers map index from: 45795.log : Negative position
2025-01-03T00:24:55,039+0000 [GarbageCollectorThread-7-1] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 1936946533 byte(s) of direct memory (used: 1140850688, max: 2147483648)
        at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:845) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:774) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:701) ~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:690) ~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:226) ~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:144) ~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.buffer.PoolArena.reallocate(PoolArena.java:302) ~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:122) ~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
        at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1030) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
        at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1168) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
        at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1071) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
        at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
        at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
        at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]

I don't have access to the environment, AFAIK there is enough direct memory and other entry logs can be compacted ok.
I don't know how it got corrupted.

Changes

Handle exception, log, skip the file. Similar to #3901

@dlg99 dlg99 requested review from hangc0276 and eolivelli January 8, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant