-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Problem Statement
The database backup feature was originally introduced to enable fast recovery during disk failover and played a certain role in the early stages of chain development. In the past, there were efforts to optimize this feature, such as completing the implementation of the backup database.
However, as the database size has grown rapidly, a series of negative effects have emerged. For example, long backup times can block block synchronization, causing the drawbacks to significantly outweigh the benefits.
Proposed Solution
Why should it be removed ?
The database backup feature is configured as follows:
storage.backup = {
enable = false // indicate whether enable the backup plugin
propPath = "prop.properties" // record which bak directory is valid
bak1path = "bak1/database" // you must set two backup directories to prevent application halt unexpected (e.g. kill -9).
bak2path = "bak2/database"
frequency = 10000 // indicate backup db once every 10000 blocks processed.
}
When this feature is enabled (enable = true), during execution of pushBlock(final BlockCapsule block), if block number % frequency == 0, all databases that implement the RevokingDatabase interface are copied to an alternative directory.
This mechanism was designed to address data corruption caused by sporadic disk failures, power outages, or abrupt process termination (e.g. kill -9) in the early stages of deployment.
Current Major Issues:
-
After optimization and extensive testing, it has been confirmed that kill -9 does not corrupt the database, which significantly reduces the necessity of periodic database backups.
-
State-related databases are extremely large. As of 2026-01-28, the mainnet state database is close to 3 TB, and a single copy operation can take several hours. During the backup window, the node is unable to synchronize blocks, which poses a fatal risk to service stability.
Alternative Solution
An alternative to database backup is to deploy dual FullNodes in a primary–backup configuration. The configuration is as follows:
node.backup {
port = 10001
# my priority, each member should use different priority
priority = 8
# peer's ip list, can't contain mine
members = [
# "ip"
]
}
If a node becomes unavailable due to database corruption or other issues, traffic can be switched to the backup node.
Specification
API Changes
None
Configuration Changes
Remove item storage.backup
Protocol Changes
None
Scope of Impact
Breaking Changes
Section of Database will be impacted.
Backward Compatibility
Not compatible with v4.8.1 or older.
Implementation
Do you have ideas regarding the implementation?
Yes
Are you willing to implement this feature?
Yes
Estimated Complexity
Medium