Skip to content

Conversation

@MegaRedHand
Copy link
Collaborator

@MegaRedHand MegaRedHand commented Feb 11, 2026

This PR addresses possible OOM issues during long chain syncs.

Previously, we stored pending blocks in memory.

Now, we store them in the DB, reducing the memory footprint. We still keep some metadata in memory, but it shouldn't bring problems on reasonable sync scenarios.

@github-actions
Copy link

🤖 Kimi Code Review

Security & Correctness Issues

1. Race Condition in insert_pending_block (store.rs:546-550)

The insert_pending_block function creates a write batch and commits it immediately, but there's no atomicity guarantee between the DB write and the in-memory state updates in on_block. If the process crashes after DB write but before in-memory state is updated, we could have inconsistent state.

Fix: Consider making insert_pending_block return the write batch to be committed atomically with other operations.

2. Potential Infinite Loop in on_block (blockchain.rs:316-334)

The while loop that walks up the chain could theoretically enter an infinite loop if there are cycles in the block graph (though this shouldn't happen with valid blocks).

Fix: Add a maximum depth limit or cycle detection:

let mut depth = 0;
const MAX_PENDING_DEPTH: u64 = 1024; // Reasonable limit
while depth < MAX_PENDING_DEPTH && let Some(header) = ... {
    depth += 1;
    ...
}

3. Missing Error Handling in insert_pending_block (store.rs:548-549)

The expect("write batch") and expect("commit") could panic on disk I/O errors.

Fix: Propagate errors instead of panicking:

pub fn insert_pending_block(&mut self, root: H256, signed_block: SignedBlockWithAttestation) -> Result<(), StorageError> {
    let mut batch = self.backend.begin_write()?;
    write_signed_block(batch.as_mut(), &root, signed_block);
    batch.commit()
}

4. Inconsistent State in process_pending_children (blockchain.rs:384-408)

If get_signed_block fails (returns None), the block root remains in pending_block_parents but is removed from pending_blocks. This creates inconsistent tracking.

Fix: Clean up pending_block_parents even when block is missing:

self.pending_block_parents.remove(&block_root); // Move this before the continue

Performance & Design Issues

5. Redundant DB Lookups in on_block (blockchain.rs:316-334)

The while loop performs multiple DB lookups for the same block. Consider caching the results.

6. Memory Usage in pending_blocks

The change from storing full SignedBlockWithAttestation to just H256 is good for memory, but the pending_blocks HashMap could still grow unbounded if blocks arrive out of order frequently.

Suggestion: Add a cleanup mechanism for very old pending entries.

Minor Issues

7. Code Duplication in write_signed_block

The logic for writing block components is duplicated between insert_signed_block and insert_pending_block. The extraction to write_signed_block is good, but insert_signed_block could use it more directly.

8. Inconsistent Logging

Some trace logs use trace!(%parent_root, %slot, ...) while others use block_root = %ShortRoot(&block_root.0). Consider standardizing the format.

Summary

The PR improves memory usage by storing only references in memory while persisting full blocks to DB. However, there are some edge cases around error handling and state consistency that should be addressed.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

Found a few correctness/robustness issues tied to the new pending-block persistence. None of the consensus-layer algorithms were changed directly, but the new storage path affects block ingestion reliability.

Findings (ordered by severity)

  • Potential crash on disk errors for untrusted data: insert_pending_block uses expect on begin_write and commit, so any storage hiccup while handling a gossiped block can panic the node (DoS vector). Consider returning Result and handling/logging in the caller. crates/storage/src/store.rs:545-549, call site crates/blockchain/src/lib.rs:307-309.
  • Potential crash on partial/corrupt DB state: in the missing-ancestor walk, if the header exists but the signed block doesn’t, expect panics. This can be triggered after an unclean shutdown or DB corruption, again from network input. Should degrade gracefully and re-request the block. crates/blockchain/src/lib.rs:321-324.
  • Lost pending child if DB entry is missing: in process_pending_children, if get_signed_block returns None, you log and drop it, so the block is never re-requested. This can permanently orphan children after a DB cleanup/corruption. Suggest re-issuing FetchBlock for block_root before skipping. crates/blockchain/src/lib.rs:392-398.

Notes

  • The fork-choice visibility change (pending blocks excluded from LiveChain) is good for correctness. No changes in LMD GHOST / 3SF-mini, attestation validation, justification/finalization, or SSZ logic in this diff.
  • Performance: persisting pending blocks trades RAM for disk IO. That’s acceptable, but consider a cap/eviction strategy to avoid unbounded disk growth from spam.

If you want, I can propose concrete error-handling changes for the storage writes and missing-block recovery.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Let me review the code now. I have enough information to provide a comprehensive review.

PR #116: Store pending blocks in db - Code Review

Summary

This PR addresses OOM issues during long chain syncs by moving pending block storage from memory to the database. The approach is sound - keeping only H256 references in memory while persisting the heavy signature data (~3KB+ per block) to disk.

Critical Issues

🔴 CRITICAL: Potential Infinite Loop in DB Walk

Location: crates/blockchain/src/lib.rs:318-337

The new DB walk logic has a potential infinite loop if there's a cycle in the chain or corrupted data:

while let Some(header) = self.store.get_block_header(&missing_root) {
    if self.store.has_state(&header.parent_root) {
        // ...process
        return;
    }
    // ...register as pending
    missing_root = header.parent_root;
}

Risk: If header.parent_root == missing_root (self-referencing block) or there's a cycle in the chain, this will loop forever.

Recommendation: Add cycle detection:

let mut visited = HashSet::new();
while let Some(header) = self.store.get_block_header(&missing_root) {
    if !visited.insert(missing_root) {
        warn!(%missing_root, "Cycle detected in pending block chain");
        break;
    }
    // ... rest of loop
}

🟡 Memory Leak: No Cleanup of Orphaned Pending Blocks

Location: crates/blockchain/src/lib.rs:308 and crates/storage/src/store.rs:545

Once blocks are written to the DB via insert_pending_block(), they remain there forever even if:

  1. The block becomes invalid
  2. The parent never arrives (orphaned fork)
  3. Finalization makes them irrelevant

Current behavior:

  • Memory structures (pending_blocks, pending_block_parents) get cleaned up
  • DB entries persist indefinitely

Impact: During long syncs with many forks, this accumulates dead data in the DB.

Recommendation: Add a pruning mechanism:

  • Option 1: Delete pending blocks from DB when finalization advances past them
  • Option 2: Add a timestamp/slot threshold for orphan cleanup
  • Option 3: Document this as known behavior if acceptable for the design

🟡 Duplicate Block Processing Risk

Location: crates/blockchain/src/lib.rs:318-326

When a pending block is found in DB with available parent state, it's processed immediately:

if self.store.has_state(&header.parent_root) {
    let block = self.store.get_signed_block(&missing_root)
        .expect("signed block exists when header exists");
    self.on_block(block);
    return;
}

Issue: This recursively calls on_block(), which may:

  1. Re-insert the same block via insert_pending_block() if there's a race condition
  2. Process the block twice if it's also in the network queue

Recommendation: Check if block is already in LiveChain before processing:

if self.store.has_state(&header.parent_root) {
    // Skip if already processed (check LiveChain)
    if !self.store.is_block_in_live_chain(&missing_root) {
        let block = self.store.get_signed_block(&missing_root)
            .expect("signed block exists when header exists");
        self.on_block(block);
    }
    return;
}

Moderate Issues

🟢 Missing Error Context in DB Read

Location: crates/blockchain/src/lib.rs:393-398

let Some(child_block) = self.store.get_signed_block(&block_root) else {
    warn!(
        block_root = %ShortRoot(&block_root.0),
        "Pending block missing from DB, skipping"
    );
    continue;
};

Issue: This is logged as a warning but may indicate a critical bug (block should always exist if we stored it as pending).

Recommendation: Consider making this an error-level log or even a panic in debug mode, as it indicates data corruption or a logic bug.

🟢 Inconsistent State Between Memory and DB

Location: crates/blockchain/src/lib.rs:308-314

The code stores to DB first, then updates memory structures. If the process crashes between these operations:

  • DB has the block
  • Memory structures don't reference it
  • Block becomes orphaned until the next restart

Current mitigation: The DB walk (lines 318-337) recovers from this on restart.

Verdict: Acceptable due to recovery logic, but worth documenting this behavior.

Minor Issues / Style

🔵 Unclear Variable Naming

Location: crates/blockchain/src/lib.rs:300-303

let mut missing_root = parent_root;
while let Some(&ancestor) = self.pending_block_parents.get(&missing_root) {
    missing_root = ancestor;
}

The variable missing_root is confusing - it's actually the "deepest known missing ancestor". Consider renaming to deepest_missing_ancestor or root_missing_ancestor.

🔵 Documentation Inconsistency

Location: crates/storage/src/store.rs:543-544

The doc comment says:

When the block is later processed via insert_signed_block, the same keys are overwritten (idempotent)

This is correct but could emphasize that both methods write to the same tables, so there's no cleanup needed when transitioning from pending→processed.

🔵 Extract Magic Method Name

Location: crates/storage/src/store.rs:877-923

The new write_signed_block() helper is well-designed and follows the extraction pattern from CLAUDE.md. Good job extracting duplicate code!

Suggestion: Consider making this pub(crate) if other storage modules might need it, otherwise keep as module-private.

Positive Observations

Good use of ownership patterns - Returning the deserialized Block from write_signed_block() avoids re-deserialization
Idempotent design - Overwriting blocks on both pending→processed transitions is clean
Maintains fork choice isolation - Pending blocks correctly excluded from LiveChain
Memory efficiency - Storing only H256 references achieves the goal
Clear separation of concerns - Storage logic properly encapsulated in store.rs

Security Considerations

Denial of Service Vectors

  1. Disk Space Exhaustion: A malicious peer could send many invalid pending blocks to fill disk

    • Mitigation needed: Add limits on pending block count or total size
  2. CPU Exhaustion: The DB walk loop could be exploited with long chains

    • Mitigation: The cycle detection fix above addresses this

Testing Recommendations

Given the complexity of this change, consider adding tests for:

  1. Restart recovery: Process crashes after insert_pending_block() but before memory updates
  2. Deep chain sync: Chain with 100+ pending blocks needing ancestry resolution
  3. Orphan cleanup: Verify behavior when parent never arrives
  4. Cycle detection: Corrupted DB with circular block references
  5. Concurrent processing: Multiple blocks arriving while processing pending queue

Overall Assessment

Verdict: Approve with required changes

The core approach is solid and achieves the OOM fix goal. However, the infinite loop vulnerability is critical and must be fixed before merge. The memory leak concern should be addressed with either cleanup logic or explicit documentation of the trade-off.

The code quality is good overall - proper use of Rust idioms, clear documentation, and good separation of concerns. The idempotent design is particularly elegant.

Required before merge:

  • ✋ Fix infinite loop with cycle detection
  • ✋ Address orphaned pending blocks (cleanup or document)

Recommended improvements:

  • 🤚 Add duplicate processing check
  • 🤚 Upgrade missing block warning to error
  • 🤚 Add test coverage for edge cases

Automated review by Claude (Anthropic) · custom prompt

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

This PR refactors pending block storage to address OOM issues during long chain syncs by moving block data from memory to the database.

Key changes:

  • pending_blocks map now stores Vec<H256> (block roots) instead of Vec<SignedBlockWithAttestation> (full blocks), reducing memory footprint significantly
  • Added Store::insert_pending_block() to persist block data (headers, bodies, signatures) without creating LiveChain entries, keeping blocks invisible to fork choice
  • Extracted duplicate block writing logic into reusable write_signed_block() helper function
  • Added DB walk-up logic in on_block() to handle pending blocks persisted from previous sessions, checking if they can now be processed

Implementation details:

  • When a block arrives with missing parent, it's now persisted to DB via insert_pending_block() and only its H256 root is stored in memory
  • On restart, if a missing ancestor is already in DB from a previous session, the code walks up the chain to find the true missing ancestor
  • process_pending_children() now loads blocks from DB using get_signed_block() before processing

Confidence Score: 4/5

  • This PR is safe to merge with minor considerations around the DB walk-up logic
  • The refactoring effectively addresses the OOM issue by moving large signature data (~3KB+ per block) from memory to DB. The code is well-structured with good separation of concerns. The DB walk-up logic (lines 318-337) handles the edge case of pending blocks from previous sessions correctly. However, there's a potential edge case where the recursive on_block call at line 325 could theoretically cause stack overflow with extremely deep chains, though this is unlikely in practice given blockchain constraints.
  • No files require special attention - both changes are straightforward refactorings

Important Files Changed

Filename Overview
crates/blockchain/src/lib.rs Changed pending blocks from storing full blocks in memory to storing only H256 references, with block data persisted in DB. Added DB walk-up logic to handle pending blocks from previous sessions.
crates/storage/src/store.rs Extracted duplicate block writing code into write_signed_block helper function and added insert_pending_block method to store blocks without LiveChain entries.

Sequence Diagram

sequenceDiagram
    participant P2P
    participant BlockChain
    participant Store
    participant DB

    Note over BlockChain: Block arrives with missing parent
    P2P->>BlockChain: on_block(signed_block)
    BlockChain->>Store: has_state(parent_root)?
    Store-->>BlockChain: false
    
    Note over BlockChain: Persist block data to DB
    BlockChain->>Store: insert_pending_block(block_root, signed_block)
    Store->>DB: write_signed_block (headers/bodies/signatures)
    Note over DB: No LiveChain entry created
    
    Note over BlockChain: Store only H256 in memory
    BlockChain->>BlockChain: pending_blocks[parent_root].push(block_root)
    
    Note over BlockChain: Walk DB for persisted ancestors
    loop While header exists in DB
        BlockChain->>Store: get_block_header(missing_root)
        Store-->>BlockChain: header
        BlockChain->>Store: has_state(header.parent_root)?
        alt State available
            Store-->>BlockChain: true
            BlockChain->>Store: get_signed_block(missing_root)
            Store-->>BlockChain: signed_block
            BlockChain->>BlockChain: on_block (recursive cascade)
        else State not available
            Store-->>BlockChain: false
            BlockChain->>BlockChain: Register as pending
            Note over BlockChain: Update missing_root to header.parent_root
        end
    end
    
    BlockChain->>P2P: FetchBlock(missing_root)
    
    Note over BlockChain: Later - parent arrives
    P2P->>BlockChain: on_block(parent_block)
    BlockChain->>BlockChain: process_pending_children(block_root)
    loop For each child_root
        BlockChain->>Store: get_signed_block(child_root)
        Store->>DB: Fetch headers/bodies/signatures
        Store-->>BlockChain: signed_block
        BlockChain->>BlockChain: on_block (recursive)
    end
Loading

@MegaRedHand MegaRedHand changed the title Store pending blocks in db feat: store pending blocks in db Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants