feat(pr-metrics): Forward needs-judge terminal PR events to Seer by vaind · Pull Request #117512 · getsentry/sentry

vaind · 2026-06-12T10:22:16Z

Problem

select_verdict settles the deterministic PR outcomes (merged_unchanged, closed_unmerged) locally and returns None for everything that needs a judge — a merge with later commits, a close with engagement, or a missing metrics row. Those needs-judge PRs were previously dropped: logged and skipped, never recorded.

What this does

Wires the Sentry → Seer judge round-trip (CORE-217). When a tracked terminal event needs a judge, Sentry forwards it to Seer, and Seer settles it via the existing update_pr_metrics callback.

Forward path. The webhook claims a JUDGE_IN_PROGRESS sentinel via the same compare-and-set redelivery guard before enqueuing, so a redelivered terminal event can't forward twice (tolerant of a missing metrics row — the defer-to-judge case). The forward runs in the forward_pr_to_seer task off the request path (the signed Seer call is blocking); 5xx/429 retry, a permanent 4xx is observe-only (the row stays claimed and never emits — no reaper this phase).
Request contract. A pydantic PrCloseJudgeRequest (typed so contract drift fails the build, not at Seer). repo reuses the shared Seer RepoDefinition via build_repo_definition (split owner/name, bare provider slug). The body also carries the captured PullRequestActivity timeline — the event sequence and actors (Bot vs human pushes, review outcomes, labels, draft transitions) that the end-state counters flatten away. Payloads are structural-only (no titles/bodies/comment text).
Single-emit callback guard. update_pr_metrics only emits on the transition off the sentinel (or an unclaimed null), so a retried Seer callback settles to a no-op rather than a duplicate analytics row. The sentinel is rejected if Seer ever echoes it back as a result.
Sentinel. PullRequestVerdict.JUDGE_IN_PROGRESS — no migration (the verdict column is a CharField; Sentry's detector ignores choices changes).
Feature gated on organizations:pr-metrics-judge, independent of emission — a needs-judge PR keeps today's skip behavior until the flag is on.

Supporting refactors: promote the attribution/group collectors to public (shared by emission and the forward); promote build_repo_definition to public with an optional event_payload (the task-side forward has no webhook payload).

Contract sync

PrCloseJudgeRequest is a manual mirror of Seer's src/seer/pr_metrics/models.py (no shared package/codegen; both sides validate with pydantic, so drift surfaces loudly). The path /v1/pr-metrics/pr-close-judge matches the Seer route from CW-1436 (implemented).

Rollout & follow-ups

Flag stays off until rollout; the Seer judge currently consumes the deterministic-shaped fields. The activity timeline is forward-compatible — Seer adds an activity field to consume it (pairs with the real judges CW-1439 / CW-1440).
CORE-245 — capture additional PR webhooks (CI/check status, review dismissals, merge-queue). Net-new captures; they flow to the judge through the activity timeline once stored, no contract change.

Refs CORE-217

The judge forward needs the same confidence-ordered attribution snapshot and resolved group ids that emission builds. Promote the two collectors from module-private to public so the forward path reuses the ordering rather than duplicating it. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

select_verdict settles the deterministic outcomes (merged_unchanged, closed_unmerged) locally and returns None for everything that needs a judge. Until now those PRs were dropped. Wire the Sentry -> Seer forward so they're handed to Seer to judge, with Seer settling them via the existing update_pr_metrics callback. The webhook claims a JUDGE_IN_PROGRESS sentinel through the same compare-and-set redelivery guard before enqueuing the forward, so a redelivered terminal event can't forward twice. The forward itself is a blocking signed Seer call, so it runs in a task off the webhook request path; a permanent (4xx) rejection is observe-only for now (the row stays claimed and simply never emits), retryable statuses retry. Gated on the new pr-metrics-judge flag independently of emission, so a needs-judge PR keeps today's skip behavior until it's enabled. Make the callback single-shot to match: only the callback that transitions the row off the sentinel (or an unclaimed null) emits, so a retried Seer callback settles to a no-op instead of a duplicate analytics row. The Seer endpoint path is the one value that must match on both sides; it's pending the sibling Seer route and easy to align once that lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

linear-code · 2026-06-12T10:22:21Z

CORE-217

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…pace Use /v1/pr-metrics/pr-close-judge rather than a code_review path; the judge forward owns its own Seer route namespace. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…forward The judge forward sent a flat repo dict with a combined "owner/repo" name, which won't validate into Seer's RepoDefinition — it requires owner and name split, a bare provider slug, and the base/org/privacy fields. Reuse code review's repo definition builder so the forward emits the exact shape Seer already parses, with no normalization on the Seer side. Promote build_repo_definition to public and make its event_payload optional: the forward runs in a task with no webhook payload, so it leaves is_private unknown. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The Seer judge returns evidence/reasoning alongside the verdict, but update_pr_metrics had no parameter for it, so Seer's callback couldn't send it without the RPC dispatch raising on an unknown kwarg. Grow the callback to take an optional verdict_details object, validate it at the trust boundary, and persist it on PullRequestMetrics next to the verdict. Carry it onto the emitted analytics row (as a JSON-encoded object) so the judge's reasoning lands in BigQuery beside the verdict it explains — the row schema is meant to grow this way. Null whenever the verdict is, i.e. on the deterministic no-judge path and before a judge result arrives. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-12T12:16:00Z

This PR has a migration; here is the generated SQL for src/sentry/migrations/1115_pull_request_metrics_verdict_details.py

for 1115_pull_request_metrics_verdict_details in sentry

--
-- Add field verdict_details to pullrequestmetrics
--
ALTER TABLE "sentry_pullrequest_metrics" ADD COLUMN "verdict_details" jsonb NULL;

… from Seer This reverts commit 5a5b744. Reason: The Seer judge isn't implemented end-to-end yet, so nothing produces the reasoning and there's no confirmed need to keep it on Sentry's side. Defer representing verdict_details here (column, callback param, analytics field) until the judge ships and we know it's worth retaining. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The judge forward built its request as a bare dict[str, Any], so mypy couldn't catch a renamed, dropped, or wrong-typed field — the body would only fail at Seer's validation. Type it as a PrCloseJudgeRequest TypedDict, matching how outbound Seer request bodies are typed in signed_seer_api, so contract drift fails the build instead. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…pedDict A TypedDict only types the body statically, and the repo sub-object — built by build_repo_definition, which returns dict[str, Any] — stayed an unchecked hole. That is exactly where the earlier owner/name shape mismatch lived, the kind of bug a TypedDict can't catch. Model the body as a pydantic PrCloseJudgeRequest reusing code review's SeerCodeReviewRepoDefinition for the repo field, matching how the sibling prevent-AI forwards (and Seer's own model) are defined. parse_obj now validates the built repo shape before send, so contract drift fails here rather than as a Seer-side rejection, and .json() handles serialization. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The judge request sent only end-state aggregates (counts, terminal facts), which flatten away the signals the judge most needs: who pushed the post-open commits (Bot vs human — the merged_unchanged/with_iteration question), review outcomes, labels, and draft transitions. All of that is already captured per-event in PullRequestActivity. Forward those rows as an ordered activity timeline. The stored payloads are structural-only (titles/bodies/comment text are excluded at capture), so each is safe to forward verbatim. This adds only data we already hold; capturing new webhook types (CI/check status, review dismissals, merge-queue) is a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit c80d8a8. Configure here.}

cursor · 2026-06-12T15:39:25Z

+                "pr_metrics.update.already_settled", extra={**log_extra, "verdict": verdict}
+            )
+            metrics.incr("pr_metrics.update.skipped", tags={"reason": "already_settled"})
+            return {"success": True}


Retry skips emit after failure

Medium Severity

The single-emit guard treats any callback that finds an already-settled verdict as complete success and skips emit_pr_metrics_row. If the first callback commits the judged verdict but analytics emission fails afterward, a Seer retry sees already_settled and returns success without ever emitting the metrics row.

^{Reviewed by Cursor Bugbot for commit c80d8a8. Configure here.}

Good catch on the mechanics, but this is a deliberate, documented tradeoff rather than a bug.

The verdict (the durable outcome) is committed in the transaction; emit_pr_metrics_row is best-effort, async-batched telemetry. A retry settling to already_settled preserves the single-emit invariant — re-emitting on retry is exactly the duplicate analytics row we guard against (the deterministic webhook path documents the identical "claim stands, telemetry row forgone on analytics.record failure — acceptable loss" tradeoff). Guaranteeing the emit would require tracking emit-success separately from the verdict, which we're intentionally not doing in this phase.

Leaving as-is by design.

_forward_to_judge claims the JUDGE_IN_PROGRESS sentinel before enqueuing the forward task. If the enqueue itself fails (e.g. broker unavailable), the claim was committed but no task will ever settle the PR, and redeliveries can't reclaim (the guard requires a null verdict) — so it sticks in judge_in_progress forever. Release the sentinel when delay() raises (only if it's still ours), so a webhook redelivery re-forwards. This is the cheaply-recoverable case — distinct from a task that exhausts retries or a callback Seer drops, which the deferred reaper covers — and mirrors the "don't burn the guard unless we acted" rule already used for untracked PRs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sentry · 2026-06-12T18:47:43Z

+        settled = (
+            PullRequestMetrics.objects.filter(pull_request=pull_request)
+            .filter(Q(verdict=PullRequestVerdict.JUDGE_IN_PROGRESS) | Q(verdict__isnull=True))
+            .update(verdict=verdict)
        )
+        if not settled:
+            logger.info(
+                "pr_metrics.update.already_settled", extra={**log_extra, "verdict": verdict}
+            )
+            metrics.incr("pr_metrics.update.skipped", tags={"reason": "already_settled"})
+            return {"success": True}


Bug: The call to emit_pr_metrics_row is outside the atomic transaction, causing permanent data loss if the emission fails, as retries will not re-emit.
_{Severity: MEDIUM}

Suggested Fix

Move the emit_pr_metrics_row(pull_request=pr) call inside the transaction.atomic block. This ensures that an emission failure will roll back the entire transaction, allowing the operation to be retried correctly without data loss.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/pr_metrics/judge.py#L322-L336 Potential issue: The call to `emit_pr_metrics_row` in `src/sentry/pr_metrics/judge.py` occurs after the atomic database transaction that commits the pull request verdict has completed. If `emit_pr_metrics_row` raises an exception, for example due to a network error during the `analytics.record` call, the process will halt. Because the verdict has already been saved to the database, any subsequent retry will detect that the verdict is settled and will not re-attempt to emit the analytics data. This results in the permanent loss of the metrics row for that pull request.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

sentry · 2026-06-12T18:47:43Z

+        metrics.incr("pr_metrics.judge.forward_failed", tags={"reason": "repo_not_found"})
+        return
+
+    forward_pr_to_seer_judge(pull_request, repository)


Bug: The forward_pr_to_seer_judge task lacks exception handling for ValueError or ValidationError, causing PRs to become permanently stuck in JUDGE_IN_PROGRESS if these errors occur.
_{Severity: MEDIUM}

Suggested Fix

Wrap the call to forward_pr_to_seer_judge within the task in a try...except block to catch ValueError and pydantic.ValidationError. In the except block, reset the PR's verdict from JUDGE_IN_PROGRESS to None to allow future retries or processing. Alternatively, add these exceptions to the task's retry decorator.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/pr_metrics/tasks.py#L67-L70 Potential issue: The `forward_pr_to_seer_judge` task is configured to retry only on `HTTPError`. However, the task can fail due to unhandled exceptions like `ValueError` from `build_repo_definition` (e.g., if a GitLab repository is missing its `config['path']`) or `pydantic.ValidationError`. When such an unhandled exception occurs, the task fails without retrying, and there is no mechanism to reset the pull request's status. As a result, the pull request's verdict remains permanently stuck in the `JUDGE_IN_PROGRESS` state, preventing it from ever being judged.

Also affects:

src/sentry/pr_metrics/judge.py:161~163

_{Did we get this right? 👍 / 👎 to inform future reviews.}

vaind and others added 2 commits June 12, 2026 12:01

github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Jun 12, 2026

vaind and others added 4 commits June 12, 2026 12:59

docs(pr-metrics): Drop issue ref from the judge-path comment

3a43346

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

fix(pr-metrics): Point the judge forward at the pr-metrics Seer names…

5fefe72

…pace Use /v1/pr-metrics/pr-close-judge rather than a code_review path; the judge forward owns its own Seer route namespace. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vaind and others added 5 commits June 12, 2026 14:42

docs(pr-metrics): Note the judge contract mirrors Seer's models

c80d8a8

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vaind marked this pull request as ready for review June 12, 2026 15:38

vaind requested review from a team as code owners June 12, 2026 15:38

cursor Bot reviewed Jun 12, 2026

View reviewed changes

sentry Bot reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pr-metrics): Forward needs-judge terminal PR events to Seer#117512

feat(pr-metrics): Forward needs-judge terminal PR events to Seer#117512
vaind wants to merge 12 commits into
masterfrom
ivandlugos/core-217-pr-metrics-forward-to-seer

vaind commented Jun 12, 2026 •

edited

Loading

Uh oh!

linear-code Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 12, 2026

Uh oh!

vaind Jun 12, 2026

Uh oh!

Uh oh!

sentry Bot Jun 12, 2026

Uh oh!

sentry Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vaind commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What this does

Contract sync

Rollout & follow-ups

Uh oh!

linear-code Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Retry skips emit after failure

Uh oh!

vaind Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sentry Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

sentry Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vaind commented Jun 12, 2026 •

edited

Loading