Skip to content

[Feature] Exclude historical balance DBs from lite snapshot #6597

@halibobo1205

Description

@halibobo1205

Background

Lite nodes retain all state data, while non-state data is limited to the most recent 65,536 blocks of block and transaction data.

toolkit db lite uses an archiveDbs list to determine which databases are excluded when splitting a lite node snapshot:

// plugins/.../DbLite.java:61-66
private static final List<String> archiveDbs = Arrays.asList(
    BLOCK_DB_NAME,                    // "block"
    BLOCK_INDEX_DB_NAME,              // "block-index"
    TRANS_DB_NAME,                    // "trans"
    TRANSACTION_RET_DB_NAME,          // "transactionRetStore"
    TRANSACTION_HISTORY_DB_NAME);     // "transactionHistoryStore"

Currently, account-trace and balance-trace are not included in the list above, so they are copied in full into the lite node snapshot. However, both databases are space-intensive and effectively unusable in a lite node context: they are enabled via the CLI flag --history-balance-lookup or the config option storage.balance.history.lookup, written during block and transaction execution, serves the historical balance query API (getAccountBalance).

Since lite nodes retain only the most recent 65,536 blocks, historical block data before the snapshot point (e.g., block, trans) is already excluded, and the historical balance query API is inherently unavailable on lite nodes — retaining these two databases provides no practical value.

Database Contents Measured Size
balance-trace Historical balance change records at the block and transaction level ≈ 690 GB
account-trace Historical account balances indexed by address + block number ≈ 180 GB
Total Based on the mainnet full node snapshot measured on 2026-03-11 ≈ 870 GB

Problem Statement

When toolkit db lite splits a Lite node snapshot, it does not include account-trace and balance-trace in the archiveDbs exclusion list. As a result, approximately 870 GB of effectively unused data is copied in full into the lite node snapshot, increasing storage costs and data transfer overhead while contributing nothing to lite node functionality.


Rationale

Why should this feature exist?

  • Storage savings: Each split operation avoids copying and transferring approximately 870 GB of unused data.
  • No functional impact: The two excluded databases are not involved in any online state computation on lite nodes; they serve only the historical balance query API, which is already unavailable on lite nodes. Furthermore, both databases are optional features that most nodes do not enable by default.

What are the use cases?

  • Node operators using toolkit db lite to generate Lite Node snapshots, reducing disk usage and network transfer costs.
  • Snapshot distribution scenarios where a significantly smaller snapshot size lowers the barrier to initial sync.
  • Storage-constrained environments where eliminating unused data preserves valuable disk space.

Who would benefit from this feature?

Node operators, lite node deployers, and snapshot distribution service providers.


Proposed Solution

Specification

1. Split (lite)

Add account-trace and balance-trace to the archiveDbs list in DbLite.java so they are automatically excluded when splitting a lite node snapshot, and classified under the archive (historical) portion:

private static final List<String> archiveDbs = Arrays.asList(
    BLOCK_DB_NAME,
    BLOCK_INDEX_DB_NAME,
    TRANS_DB_NAME,
    TRANSACTION_RET_DB_NAME,
    TRANSACTION_HISTORY_DB_NAME,
    ACCOUNT_TRACE_DB_NAME,    // new
    BALANCE_TRACE_DB_NAME);   // new

2. Merge (merge)

The merge operation combines a lite node snapshot with historical data to restore a full node. Since account-trace and balance-trace are now classified on the archive side, the merge logic must be updated and validated accordingly:

Merge strategy:

Source Action
account-trace / balance-trace from archive Retain entries with height < lite_height
account-trace / balance-trace from lite snapshot Append only entries with height > archive_height
Final result Two contiguous segments, no overlap, no gap
  • API Changes: None.
  • Configuration Changes: None.
  • Protocol Changes: None.

Testing Strategy

Test Scenarios

  1. After running toolkit db lite, verify that the lite snapshot directory does not contain account-trace or balance-trace, and that the archive directory does contain them (if the source node had the feature enabled).
  2. Compare snapshot sizes before and after the change to confirm the reduction matches expectations (≈ 870 GB).
  3. Start a lite node from the new snapshot and verify that block sync, state queries, and other core functions work correctly with no errors.
  4. Merge validation (source node with history-balance-lookup enabled): after merging, verify that the full node contains complete account-trace and balance-trace data and that the historical balance query API functions correctly.

Performance Considerations

The change affects only file-handling logic during the snapshot split and merge phases; it has no impact on node runtime performance.


Scope of Impact

  • Core protocol
  • API/RPC
  • Database
  • Network layer
  • Smart contracts
  • Documentation
  • Other: toolkit db lite.

Breaking Changes

None.

Backward Compatibility

No modifications are made to existing full node database directories; only the behavior of toolkit db lite splitting is affected. Previously generated lite node snapshots are not impacted.


Implementation

Do you have ideas regarding the implementation?

Append ACCOUNT_TRACE_DB_NAME and BALANCE_TRACE_DB_NAME to the archiveDbs constant in DbLite.java — a minimal code change. Add incremental merge logic to the merge flow for these two databases, scoped by block height range.

Are you willing to implement this feature?

  • Yes, I can implement this feature

Estimated Complexity

  • Medium (moderate changes)

Alternatives Considered

None.


Additional Context

Related Issues/PRs

None.

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions