Skip to content

testing: Add Test Coverage for Prometheus Metrics #269

@chinma-yyy

Description

@chinma-yyy

Problem

We're exposing 5 Prometheus metrics from the controller, but only 1 of them (BootstrapCompleted) has test coverage. This makes it harder to catch regressions when metrics behavior changes.

Current Metrics Status

  • BootstrapCompleted - Has tests
  • RulesTotal - No tests
  • TaintOperations - No tests
  • Failures - No tests
  • EvaluationDuration - No tests (and not even used in code yet)

What Needs to Be Done

1. Add tests for RulesTotal metric

This gauge tracks the total number of active rules. We should test:

  • Value increases when a rule is added to the cache
  • Value decreases when a rule is removed from the cache

2. Add tests for TaintOperations metric

This counter tracks taint add/remove operations. We should test:

  • Counter increments when adding a taint (with labels: rule, operation="add")
  • Counter increments when removing a taint (with labels: rule, operation="remove")

3. Add tests for Failures metric

This counter tracks operational failures. We should test:

  • Counter increments on evaluation errors (label: reason="EvaluationError")
  • Counter increments on taint operation failures (labels: reason="AddTaintError" or reason="RemoveTaintError")

4. Implement and test EvaluationDuration metric

This histogram is defined but never used. We should:

  • Add instrumentation to the evaluateRuleForNode() function to record evaluation duration
  • Add tests to verify the histogram records evaluations

Acceptance Criteria

  • All 5 metrics have unit test coverage
  • Tests verify metrics behave correctly in both success and failure scenarios
  • All tests pass with make test

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions