Skip to content

CNV-80608: k8s: add alert listing/query and filter primitives#832

Closed
sradco wants to merge 1 commit intoopenshift:alerts-management-apifrom
sradco:alert-mgmt-02-k8s-alert-queries
Closed

CNV-80608: k8s: add alert listing/query and filter primitives#832
sradco wants to merge 1 commit intoopenshift:alerts-management-apifrom
sradco:alert-mgmt-02-k8s-alert-queries

Conversation

@sradco
Copy link

@sradco sradco commented Mar 11, 2026

Alert Management API — Part 2/15: alert listing, query & filter primitives

Summary

  • PrometheusAlerts implementation: fetches alerts and rule groups from both platform and user-workload Prometheus/Thanos endpoints
  • Namespace-scoped Thanos tenancy queries for user-workload rules
  • State-based alert filtering (firing, pending, silenced)
  • Exact-match label filtering (key=value) with special handling for namespace (used for tenancy selection, not rule filtering)
  • Prometheus-style label matchers (match[] semantics) supporting full selectors and selector bodies
  • TLS transport with service CA for in-cluster communication
  • Unit tests for label matcher compilation and matching semantics
  • Alerting health types (AlertingHealth, AlertingStackHealth, AlertingRouteHealth, RouteStatus) and health-checking functions (alertingHealth, stackHealth, routeHealth)
  • Alertmanager integration — fetching alerts from both Prometheus and Alertmanager APIs, parsing Alertmanager v2 responses, state mapping between Alertmanager/Prometheus states

Dependencies

This PR is part of a stacked series. Please review in order.

  1. alert-mgmt-01: k8s foundation + health stub skeleton ✅ merged
  2. → This PR — alert listing, query & filter primitives
    3–15. Pending — relabel config, alerting health, management foundation, CRUD endpoints, classification, bulk update, docs/CI/e2e, single-rule API, orphan GC, effective metric

Made with Cursor

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 11, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 11, 2026

@sradco: This pull request references CNV-80608 which is a valid jira issue.

Details

In response to this:

Alert Management API — Part 2/15: alert listing, query & filter primitives

Summary

  • PrometheusAlerts implementation: fetches alerts and rule groups from both platform and user-workload Prometheus/Thanos endpoints
  • Namespace-scoped Thanos tenancy queries for user-workload rules
  • State-based alert filtering (firing, pending, silenced)
  • Exact-match label filtering (key=value) with special handling for namespace (used for tenancy selection, not rule filtering)
  • Prometheus-style label matchers (match[] semantics) supporting full selectors and selector bodies
  • TLS transport with service CA for in-cluster communication
  • Unit tests for label matcher compilation and matching semantics

Dependencies

This PR is part of a stacked series. Please review in order.

  1. alert-mgmt-01: k8s foundation + health stub skeleton ✅ merged
  2. → This PR — alert listing, query & filter primitives
    3–15. Pending — relabel config, alerting health, management foundation, CRUD endpoints, classification, bulk update, docs/CI/e2e, single-rule API, orphan GC, effective metric

Made with Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from PeterYurkovich and kyoto March 11, 2026 09:09
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sradco
Once this PR has been reviewed and has the lgtm label, please assign etmurasaki for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2e74893d-7fe5-491e-b6a4-efcd57f13522

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 11, 2026

@sradco: This pull request references CNV-80608 which is a valid jira issue.

Details

In response to this:

Alert Management API — Part 2/15: alert listing, query & filter primitives

Summary

  • PrometheusAlerts implementation: fetches alerts and rule groups from both platform and user-workload Prometheus/Thanos endpoints
  • Namespace-scoped Thanos tenancy queries for user-workload rules
  • State-based alert filtering (firing, pending, silenced)
  • Exact-match label filtering (key=value) with special handling for namespace (used for tenancy selection, not rule filtering)
  • Prometheus-style label matchers (match[] semantics) supporting full selectors and selector bodies
  • TLS transport with service CA for in-cluster communication
  • Unit tests for label matcher compilation and matching semantics
  • Alerting health types (AlertingHealth, AlertingStackHealth, AlertingRouteHealth, RouteStatus) and health-checking functions (alertingHealth, stackHealth, routeHealth)
  • Alertmanager integration — fetching alerts from both Prometheus and Alertmanager APIs, parsing Alertmanager v2 responses, state mapping between Alertmanager/Prometheus states

Dependencies

This PR is part of a stacked series. Please review in order.

  1. alert-mgmt-01: k8s foundation + health stub skeleton ✅ merged
  2. → This PR — alert listing, query & filter primitives
    3–15. Pending — relabel config, alerting health, management foundation, CRUD endpoints, classification, bulk update, docs/CI/e2e, single-rule API, orphan GC, effective metric

Made with Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Add PrometheusAlerts implementation with:
- Alert fetching from platform and user-workload Prometheus/Thanos
- Rule group fetching with namespace-scoped Thanos tenancy queries
- State-based filtering (firing, pending, silenced)
- Label-based flat filtering with key=value matching
- TLS transport with service CA for in-cluster communication

Signed-off-by: Shirly Radco <sradco@redhat.com>
Signed-off-by: João Vilaça <jvilaca@redhat.com>
Signed-off-by: Aviv Litman <alitman@redhat.com>
Co-authored-by: AI Assistant <noreply@cursor.com>
@sradco sradco force-pushed the alert-mgmt-02-k8s-alert-queries branch from 08289c7 to ffe8541 Compare March 11, 2026 09:27
@sradco
Copy link
Author

sradco commented Mar 11, 2026

Hi @jgbernalp , @jan--f , @simonpasquier, Will appreciate your review of this pr. Thank you.

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a few comments but overall the PR is hard to review because the added code is often not used. I'd rather have one API endpoint implemented at a time (even if not witha ll the functionality).

return pa.executeRequest(ctx, client, requestURL)
}

func (pa *prometheusAlerts) resolveThanosTenancyRulesPort(ctx context.Context) int32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to resolve the port? it's a fixed value (and part of the CMO contract somehow).

if sel == "" {
continue
}
if !strings.Contains(sel, "{") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we have a "{" in the label value?

func compileRuleLabelMatchers(req GetRulesRequest) ([]*labels.Matcher, error) {
var out []*labels.Matcher

for k, v := range req.Labels {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need 2 ways (Labels and Matchers) for matching rules...

return groups, nil
}

return filterRuleGroupsByLabelMatchers(groups, matchers), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we apply label matchers directly in the requests querying the backends?

return nil, fmt.Errorf("failed to create PrometheusRule manager: %w", err)
}

c.prometheusAlerts = newPrometheusAlerts(routeClientset, clientset.CoreV1(), config, c.prometheusRuleManager)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why going through the route?

token := BearerTokenFromContext(ctx)
if token == "" {
var err error
token, err = pa.loadBearerToken()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that we use the service account's token?

return filterRuleGroupsByLabelMatchers(groups, matchers), nil
}

func (pa *prometheusAlerts) alertingHealth(ctx context.Context) AlertingHealth {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what the purpose of this method?

@simonpasquier
Copy link
Contributor

There are also very few tests compared to the number of lines being added.

@sradco
Copy link
Author

sradco commented Mar 12, 2026

@simonpasquier let me address your concern of the split first to hopefully make it easier to review

@sradco
Copy link
Author

sradco commented Mar 12, 2026

Closing this PR in favor of #841

@sradco sradco closed this Mar 12, 2026
@sradco sradco deleted the alert-mgmt-02-k8s-alert-queries branch March 12, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants