Skip to content

feat(metrics): add block transaction count histogram for empty block monitoring#6602

Open
ToXMon wants to merge 5 commits intotronprotocol:developfrom
ToXMon:feature/prometheus-infra-metrics
Open

feat(metrics): add block transaction count histogram for empty block monitoring#6602
ToXMon wants to merge 5 commits intotronprotocol:developfrom
ToXMon:feature/prometheus-infra-metrics

Conversation

@ToXMon
Copy link
Copy Markdown

@ToXMon ToXMon commented Mar 25, 2026

Summary

Implements Issue #6590 — adds Prometheus metrics for empty block detection and SR set change monitoring:

  • tron:block_transaction_count — Histogram tracking distribution of transaction counts per block (enables empty block detection via bucket le="0.0")
  • tron:sr_set_change_total — Counter tracking SR set changes with added/removed labels

Changes

New Metrics

Metric Type Labels Description
tron:block_transaction_count Histogram miner Distribution of tx counts per block (empty blocks: le="0.0")
tron:sr_set_change_total Counter witness, change_type SR set changes (added/removed)

Key Implementation Details

  • Histogram buckets: [0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000]
  • Empty block query: tron:block_transaction_count_bucket{le="0.0"}
  • Empty block ratio: rate(tron:block_transaction_count_bucket{le="0.0"}[1h]) / rate(tron:block_transaction_count_count[1h])

Files Modified

  1. common/src/main/java/org/tron/common/prometheus/MetricKeys.java — removed BLOCK_EMPTY counter, added BLOCK_TRANSACTION_COUNT histogram
  2. common/src/main/java/org/tron/common/prometheus/MetricsCounter.java — removed BLOCK_EMPTY registration
  3. common/src/main/java/org/tron/common/prometheus/MetricsHistogram.java — added overloaded init() for custom buckets
  4. framework/src/main/java/org/tron/core/metrics/blockchain/BlockChainMetricManager.java — histogramObserve() for all blocks, keeps SR counter
  5. framework/src/test/java/org/tron/core/metrics/prometheus/PrometheusApiServiceTest.java — updated tests for histogram

Testing

  • testMetric() — PASSED
  • testEmptyBlockMetric() — PASSED
  • testSrSetChangeMetric() — PASSED

Advantages Over Simple Counter

  • Rich insights: Tracks full distribution of tx counts, not just empty blocks
  • Flexible queries: Operators can query percentiles, trends, specific ranges
  • Backward compatible: Still enables empty block monitoring via le="0.0" bucket

Backward Compatibility

Fully backward compatible. Zero protocol changes, zero API changes, zero config changes.

Closes #6590

…monitoring

Replace the dedicated tron:block_empty_total counter with a more comprehensive
tron:block_transaction_count histogram that tracks the distribution of
transaction counts per block.

Changes:
- Add overloaded init() method in MetricsHistogram to support custom buckets
- Add BLOCK_TRANSACTION_COUNT histogram with buckets [0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000]
- Record transaction count for all blocks (including empty blocks with txCount=0)
- Empty blocks can be queried via bucket le=0.0
- Remove unused BLOCK_EMPTY counter
- Keep SR_SET_CHANGE counter for SR set change monitoring

This provides richer insights for network analysis while still supporting
empty block monitoring via histogram bucket queries.

Closes tronprotocol#6590
Copy link
Copy Markdown

@warku123 warku123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ToXMon and maintainers,

I actually have an alternative implementation for Issue #6590 that addresses some of the concerns mentioned above:

Key differences in my approach:

  1. SR Change Detection: Only triggers during maintenance period changes (using nextMaintenanceTime check) to avoid per-block overhead
  2. Label Alignment: Uses action (add/remove) as specified in the original issue
  3. Implementation: Uses standard Set operations without additional Guava dependencies

My implementation includes:

  • Histogram with buckets [0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000] for transaction counts
  • SR set change Counter with proper maintenance period detection
  • Comprehensive tests for histogram bucket queries (le="0.0")

The branch is ready and passes all CI checks. I'm happy to:

  • Submit a separate PR for comparison
  • Collaborate with @xl-2055 to merge the best aspects of both implementations

Let me know how you'd like to proceed!

StringUtil.encode58Check(address));

// SR set change detection
List<ByteString> currentSrList =
Copy link
Copy Markdown

@warku123 warku123 Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ToXMon, thanks for the contribution! I've identified a few issues that need attention:

1. SR Set Change Detection Logic Issue
The current implementation checks for SR changes on every applyBlock() call without considering the maintenance period:

// Current: runs on EVERY block
List<ByteString> currentSrList = chainBaseManager.getWitnessScheduleStore().getActiveWitnesses();

However, SR set changes only happen during maintenance periods. This could cause:

  • Unnecessary performance overhead (set comparison on every block)
  • False positives if there are temporary witness list fluctuations

2. Label Naming Inconsistency
The Issue #6590 specification suggests using action (add/remove) for the label key, but this PR uses change_type (added/removed). While both work, aligning with the original issue specification would be better for consistency.

3. GitHub CI Check
It looks like the PR may not pass all CI checks yet. Please ensure:

  • All tests pass (./gradlew test)
  • Checkstyle passes (./gradlew checkstyleMain checkstyleTest)
  • No compilation warnings

private long failProcessBlockNum = 0;
@Setter
private String failProcessBlockReason = "";
private Set<String> previousSrSet = new HashSet<>();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the applyBlock() call path is currently single-threaded
(protected by the Manager lock), writes to previousSrSet are not guaranteed to be visible to other threads (e.g., Metrics query threads) without proper synchronization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add Prometheus metrics for empty blocks and SR set changes

4 participants