fix: RAC checkpoint resume + heap-use-after-free on shutdown by rophy · Pull Request #6 · rophy/OpenLogReplicator

rophy · 2026-03-23T20:35:59Z

Summary

fix: per-thread archive offset for RAC checkpoint resume — archive single-thread path used global metadata->fileOffset instead of per-thread value, causing ERROR 50051 did not find lwn at offset after kill/restart
fix: heap-use-after-free during error shutdown — WriterStream destructor didn't clear ctx->parserThread, leaving dangling pointer for Builder/Parser destructors
chore: remove LFS checkout from CI — no longer needed after LFS removal from repo

Test plan

2-hour RAC soak test: 40 iterations, 120 kill/restart cycles, zero failures
SQL e2e: free-23 (36 pass), xe-21 (36 pass), RAC (54 pass)
Single-instance checkpoint/restart: 10 cycles pass

Summary by CodeRabbit

Bug Fixes
- Prevented invalid memory access with an added null-safety check in context cleanup.
- Eliminated dangling/stale pointer issues during stream destruction to avoid crashes.
- Improved per-thread context synchronization for more reliable archived log replay.
Chores
- Adjusted CI checkout behavior to stop enabling Git LFS during the test workflow.

LFS is no longer used after removing static fixtures.

WriterStream sets ctx->parserThread = this, but the destructor didn't clear it. When writers are deleted before builders/parsers in ~OpenLogReplicator(), Builder::~Builder() and Parser::~Parser() call freeMemoryChunk(ctx->parserThread, ...) on the freed WriterStream. Fix: null out ctx->parserThread in WriterStream destructor, and guard freeMemoryChunk() against null thread pointer. Found by ASAN during RAC checkpoint/restart fault tolerance testing. Tracked: #5

The archive single-thread processing path used the global metadata->fileOffset instead of the per-thread value from threadStates. After RAC checkpoint resume, thread 1's parser would seek to thread 2's offset, causing ERROR 50051 "did not find lwn at offset". Add per-thread context switch (fileOffset + sequence) before parser->parse() and save-back after, matching what the multi-thread online path already does. Validated with 2-hour soak test: 43 iterations, 129 kill/restart cycles, zero failures.

coderabbitai · 2026-03-23T20:36:15Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ff854b5-4cd9-4325-8242-ace9bf4800f0

📥 Commits

Reviewing files that changed from the base of the PR and between 9554210 and ecde394.

📒 Files selected for processing (1)

src/common/Ctx.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

src/common/Ctx.cpp

📝 Walkthrough

Walkthrough

This PR removes Git LFS from a CI checkout, adds null-check guards before per-thread context updates, synchronizes per-thread fileOffset/sequence around archived log parsing, and clears stale parser thread pointers in WriterStream destructor to prevent use-after-free.

Changes

Cohort / File(s)	Summary
CI Configuration `.github/workflows/redo-log-tests.yaml`	Removed `lfs: true` option from `actions/checkout@v4` in the `test` job.
Memory safety / Context guards `src/common/Ctx.cpp`	Guard `t->contextSet(...)` calls with `if (likely(t != nullptr))` to avoid dereferencing null/stale thread pointers.
Destructor cleanup `src/writer/WriterStream.cpp`	Clear `ctx->parserThread` when it points to the destroying `WriterStream` instance to avoid stale pointer use during later free operations.
Per-thread state sync during parsing `src/replicator/Replicator.cpp`	In single-thread archived-log path, restore per-thread `fileOffset`/`sequence` into shared metadata before `parser->parse()`, then save updated values back into the thread's saved state after parse, ensuring correct per-thread resume behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Replicator
    participant Metadata
    participant Parser
    participant ThreadState

    Replicator->>Metadata: select bestThread
    Replicator->>ThreadState: load saved fileOffset, sequence for bestThread
    ThreadState-->>Metadata: set metadata.fileOffset, metadata.sequence
    Replicator->>Parser: parser->parse(...)  -- executes with restored metadata
    Parser-->>Metadata: update fileOffset, sequence during parse
    Replicator->>ThreadState: save metadata.fileOffset, metadata.sequence back to thread state
    Replicator->>Metadata: setFirstNextScn(...)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

bug: duplicate SYS.CCOL$ error caused by ADAPTIVE_SCHEMA (flags: 4) + heap-use-after-free on shutdown #5: Matches the heap-use-after-free fixes (nulling ctx->parserThread and adding t != nullptr guard) described in that issue.

Poem

🐰 I nibbled stale pointers in the night,

Cleared a thread so memory sleeps tight,
Sync'd offsets, hopped through parsing gates,
CI chewed less LFS on its plates,
Hooray — no more bugs to give me frights! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the two main fixes in the changeset: RAC checkpoint resume bug and heap-use-after-free issue on shutdown.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/rac-checkpoint-resume

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/common/Ctx.cpp (1)

531-558: ⚠️ Potential issue | 🔴 Critical

Null dereference is still possible in freeMemoryChunk.

Line 531 adds a null check, but Line 554 and Line 558 still dereference t unconditionally. With ctx->parserThread now being set to nullptr during teardown, this path can still crash.

🐛 Minimal fix

-    if (chunk != nullptr) {
-        t->contextSet(Thread::CONTEXT::OS, Thread::REASON::OS);
-        free(chunk);
-    }
-
-    t->contextSet(Thread::CONTEXT::CPU);
+    if (chunk != nullptr) {
+        if (likely(t != nullptr))
+            t->contextSet(Thread::CONTEXT::OS, Thread::REASON::OS);
+        free(chunk);
+    }
+
+    if (likely(t != nullptr))
+        t->contextSet(Thread::CONTEXT::CPU);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/common/Ctx.cpp` around lines 531 - 558, The code still dereferences the
Thread pointer t unconditionally later in freeMemoryChunk; wrap the two later
uses of t->contextSet (the call before free(chunk) and the final
t->contextSet(Thread::CONTEXT::CPU)) with a null check (e.g., only call
t->contextSet(...) if t != nullptr) to avoid dereferencing ctx->parserThread
after teardown; locate these calls in Ctx.cpp within the freeMemoryChunk path
and guard them the same way the early contextSet is guarded.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/common/Ctx.cpp`:
- Around line 531-558: The code still dereferences the Thread pointer t
unconditionally later in freeMemoryChunk; wrap the two later uses of
t->contextSet (the call before free(chunk) and the final
t->contextSet(Thread::CONTEXT::CPU)) with a null check (e.g., only call
t->contextSet(...) if t != nullptr) to avoid dereferencing ctx->parserThread
after teardown; locate these calls in Ctx.cpp within the freeMemoryChunk path
and guard them the same way the early contextSet is guarded.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0023f2be-cb1e-42e1-800c-ae0c6fe4713d

📥 Commits

Reviewing files that changed from the base of the PR and between 911ba28 and 9554210.

📒 Files selected for processing (4)

.github/workflows/redo-log-tests.yaml
src/common/Ctx.cpp
src/replicator/Replicator.cpp
src/writer/WriterStream.cpp

💤 Files with no reviewable changes (1)

.github/workflows/redo-log-tests.yaml

CodeRabbit review caught two more unguarded dereferences of the Thread pointer after the null check at line 531. Lines 554 and 558 still called t->contextSet() unconditionally, which would crash when ctx->parserThread is null during teardown.

rophy · 2026-03-23T23:51:09Z

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
🤖 Prompt for all review comments with AI agents
ℹ️ Review info

@coderabbitai fixed, can you resolve the comment?

coderabbitai · 2026-03-23T23:52:20Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# List review comments made by coderabbitai on PR `#6`
gh api repos/rophy/OpenLogReplicator/pulls/6/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, body: .body[0:200], path: .path}'

Length of output: 0

🏁 Script executed:

#!/bin/bash
# Check general PR comments by coderabbitai
gh api repos/rophy/OpenLogReplicator/issues/6/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, body: .body[0:300]}'

echo "---REVIEWS---"
# Check review threads
gh api repos/rophy/OpenLogReplicator/pulls/6/reviews --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, body: .body[0:300], state: .state}'

Length of output: 13

🏁 Script executed:

#!/bin/bash
# List all reviews on the PR
gh pr view 6 --repo rophy/OpenLogReplicator --json reviews,comments --jq '.'

echo "---ALL REVIEW COMMENTS---"
gh api repos/rophy/OpenLogReplicator/pulls/6/comments --jq '.[] | {id: .id, user: .user.login, body: .body[0:200], path: .path}'

Length of output: 19364

🏁 Script executed:

#!/bin/bash
# Try to find resolvable review threads via GraphQL
gh api graphql -f query='
{
  repository(owner: "rophy", name: "OpenLogReplicator") {
    pullRequest(number: 6) {
      reviewThreads(first: 20) {
        nodes {
          id
          isResolved
          comments(first: 1) {
            nodes {
              author { login }
              body
            }
          }
        }
      }
    }
  }
}'

Length of output: 70

@rophy: The fix looks great — the latest commit correctly guards the remaining two t->contextSet(...) calls in freeMemoryChunk with if (likely(t != nullptr)), which is exactly what was flagged.

Unfortunately, since the comment was flagged as an "outside diff range" comment, it was embedded in the review body rather than posted as an inline thread — so there is no resolvable thread to dismiss programmatically. The review is effectively addressed by the fix itself. ✅

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

rophy added 3 commits March 23, 2026 20:35

chore: remove LFS checkout from CI workflow

9a7bbe6

LFS is no longer used after removing static fixtures.

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

rophy merged commit c81e190 into master Mar 23, 2026
2 checks passed

rophy deleted the fix/rac-checkpoint-resume branch March 23, 2026 23:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: RAC checkpoint resume + heap-use-after-free on shutdown#6

fix: RAC checkpoint resume + heap-use-after-free on shutdown#6
rophy merged 4 commits intomasterfrom
fix/rac-checkpoint-resume

rophy commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

rophy commented Mar 23, 2026

Uh oh!

coderabbitai bot commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rophy commented Mar 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rophy commented Mar 23, 2026

Uh oh!

coderabbitai bot commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rophy commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading