Skip to content

[Feature] Remove periodic database backup in favor of dual-node failover #6595

@liuyifei001

Description

@liuyifei001

Problem Statement

The database backup feature was originally introduced to enable fast recovery during disk failover and played a certain role in the early stages of chain development. In the past, there were efforts to optimize this feature, such as completing the implementation of the backup database.

However, as the database size has grown rapidly, a series of negative effects have emerged. For example, long backup times can block block synchronization, causing the drawbacks to significantly outweigh the benefits.

Proposed Solution

Why should it be removed ?

The database backup feature is configured as follows:

storage.backup = {
  enable = false  // indicate whether enable the backup plugin
  propPath = "prop.properties" // record which bak directory is valid
  bak1path = "bak1/database" // you must set two backup directories to prevent application halt unexpected (e.g. kill -9).
  bak2path = "bak2/database"
  frequency = 10000   // indicate backup db once every 10000 blocks processed.
}

When this feature is enabled (enable = true), during execution of pushBlock(final BlockCapsule block), if block number % frequency == 0, all databases that implement the RevokingDatabase interface are copied to an alternative directory.

This mechanism was designed to address data corruption caused by sporadic disk failures, power outages, or abrupt process termination (e.g. kill -9) in the early stages of deployment.

Current Major Issues:

  • After optimization and extensive testing, it has been confirmed that kill -9 does not corrupt the database, which significantly reduces the necessity of periodic database backups.

  • State-related databases are extremely large. As of 2026-01-28, the mainnet state database is close to 3 TB, and a single copy operation can take several hours. During the backup window, the node is unable to synchronize blocks, which poses a fatal risk to service stability.

Alternative Solution

An alternative to database backup is to deploy dual FullNodes in a primary–backup configuration. The configuration is as follows:

node.backup {
  port = 10001
  # my priority, each member should use different priority
  priority = 8
  # peer's ip list, can't contain mine
  members = [
    # "ip"
  ]
}

If a node becomes unavailable due to database corruption or other issues, traffic can be switched to the backup node.

Specification

API Changes
None

Configuration Changes
Remove item storage.backup

Protocol Changes
None

Scope of Impact

Breaking Changes
Section of Database will be impacted.

Backward Compatibility
Not compatible with v4.8.1 or older.

Implementation

Do you have ideas regarding the implementation?
Yes

Are you willing to implement this feature?
Yes

Estimated Complexity
Medium

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions