fix: RAC checkpoint resume + heap-use-after-free on shutdown#6
Conversation
LFS is no longer used after removing static fixtures.
WriterStream sets ctx->parserThread = this, but the destructor didn't clear it. When writers are deleted before builders/parsers in ~OpenLogReplicator(), Builder::~Builder() and Parser::~Parser() call freeMemoryChunk(ctx->parserThread, ...) on the freed WriterStream. Fix: null out ctx->parserThread in WriterStream destructor, and guard freeMemoryChunk() against null thread pointer. Found by ASAN during RAC checkpoint/restart fault tolerance testing. Tracked: #5
The archive single-thread processing path used the global metadata->fileOffset instead of the per-thread value from threadStates. After RAC checkpoint resume, thread 1's parser would seek to thread 2's offset, causing ERROR 50051 "did not find lwn at offset". Add per-thread context switch (fileOffset + sequence) before parser->parse() and save-back after, matching what the multi-thread online path already does. Validated with 2-hour soak test: 43 iterations, 129 kill/restart cycles, zero failures.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR removes Git LFS from a CI checkout, adds null-check guards before per-thread context updates, synchronizes per-thread fileOffset/sequence around archived log parsing, and clears stale parser thread pointers in WriterStream destructor to prevent use-after-free. Changes
Sequence Diagram(s)sequenceDiagram
participant Replicator
participant Metadata
participant Parser
participant ThreadState
Replicator->>Metadata: select bestThread
Replicator->>ThreadState: load saved fileOffset, sequence for bestThread
ThreadState-->>Metadata: set metadata.fileOffset, metadata.sequence
Replicator->>Parser: parser->parse(...) -- executes with restored metadata
Parser-->>Metadata: update fileOffset, sequence during parse
Replicator->>ThreadState: save metadata.fileOffset, metadata.sequence back to thread state
Replicator->>Metadata: setFirstNextScn(...)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/common/Ctx.cpp (1)
531-558:⚠️ Potential issue | 🔴 CriticalNull dereference is still possible in
freeMemoryChunk.Line 531 adds a null check, but Line 554 and Line 558 still dereference
tunconditionally. Withctx->parserThreadnow being set tonullptrduring teardown, this path can still crash.🐛 Minimal fix
- if (chunk != nullptr) { - t->contextSet(Thread::CONTEXT::OS, Thread::REASON::OS); - free(chunk); - } - - t->contextSet(Thread::CONTEXT::CPU); + if (chunk != nullptr) { + if (likely(t != nullptr)) + t->contextSet(Thread::CONTEXT::OS, Thread::REASON::OS); + free(chunk); + } + + if (likely(t != nullptr)) + t->contextSet(Thread::CONTEXT::CPU);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/common/Ctx.cpp` around lines 531 - 558, The code still dereferences the Thread pointer t unconditionally later in freeMemoryChunk; wrap the two later uses of t->contextSet (the call before free(chunk) and the final t->contextSet(Thread::CONTEXT::CPU)) with a null check (e.g., only call t->contextSet(...) if t != nullptr) to avoid dereferencing ctx->parserThread after teardown; locate these calls in Ctx.cpp within the freeMemoryChunk path and guard them the same way the early contextSet is guarded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/common/Ctx.cpp`:
- Around line 531-558: The code still dereferences the Thread pointer t
unconditionally later in freeMemoryChunk; wrap the two later uses of
t->contextSet (the call before free(chunk) and the final
t->contextSet(Thread::CONTEXT::CPU)) with a null check (e.g., only call
t->contextSet(...) if t != nullptr) to avoid dereferencing ctx->parserThread
after teardown; locate these calls in Ctx.cpp within the freeMemoryChunk path
and guard them the same way the early contextSet is guarded.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0023f2be-cb1e-42e1-800c-ae0c6fe4713d
📒 Files selected for processing (4)
.github/workflows/redo-log-tests.yamlsrc/common/Ctx.cppsrc/replicator/Replicator.cppsrc/writer/WriterStream.cpp
💤 Files with no reviewable changes (1)
- .github/workflows/redo-log-tests.yaml
CodeRabbit review caught two more unguarded dereferences of the Thread pointer after the null check at line 531. Lines 554 and 558 still called t->contextSet() unconditionally, which would crash when ctx->parserThread is null during teardown.
@coderabbitai fixed, can you resolve the comment? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 13 🏁 Script executed: Length of output: 19364 🏁 Script executed: Length of output: 70
Unfortunately, since the comment was flagged as an "outside diff range" comment, it was embedded in the review body rather than posted as an inline thread — so there is no resolvable thread to dismiss programmatically. The review is effectively addressed by the fix itself. ✅ If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
Summary
metadata->fileOffsetinstead of per-thread value, causingERROR 50051 did not find lwn at offsetafter kill/restartWriterStreamdestructor didn't clearctx->parserThread, leaving dangling pointer forBuilder/ParserdestructorsTest plan
Summary by CodeRabbit
Bug Fixes
Chores