Skip to content

feat(pr-metrics): Forward needs-judge terminal PR events to Seer#117512

Open
vaind wants to merge 12 commits into
masterfrom
ivandlugos/core-217-pr-metrics-forward-to-seer
Open

feat(pr-metrics): Forward needs-judge terminal PR events to Seer#117512
vaind wants to merge 12 commits into
masterfrom
ivandlugos/core-217-pr-metrics-forward-to-seer

Conversation

@vaind

@vaind vaind commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Problem

select_verdict settles the deterministic PR outcomes (merged_unchanged, closed_unmerged) locally and returns None for everything that needs a judge — a merge with later commits, a close with engagement, or a missing metrics row. Those needs-judge PRs were previously dropped: logged and skipped, never recorded.

What this does

Wires the Sentry → Seer judge round-trip (CORE-217). When a tracked terminal event needs a judge, Sentry forwards it to Seer, and Seer settles it via the existing update_pr_metrics callback.

  • Forward path. The webhook claims a JUDGE_IN_PROGRESS sentinel via the same compare-and-set redelivery guard before enqueuing, so a redelivered terminal event can't forward twice (tolerant of a missing metrics row — the defer-to-judge case). The forward runs in the forward_pr_to_seer task off the request path (the signed Seer call is blocking); 5xx/429 retry, a permanent 4xx is observe-only (the row stays claimed and never emits — no reaper this phase).
  • Request contract. A pydantic PrCloseJudgeRequest (typed so contract drift fails the build, not at Seer). repo reuses the shared Seer RepoDefinition via build_repo_definition (split owner/name, bare provider slug). The body also carries the captured PullRequestActivity timeline — the event sequence and actors (Bot vs human pushes, review outcomes, labels, draft transitions) that the end-state counters flatten away. Payloads are structural-only (no titles/bodies/comment text).
  • Single-emit callback guard. update_pr_metrics only emits on the transition off the sentinel (or an unclaimed null), so a retried Seer callback settles to a no-op rather than a duplicate analytics row. The sentinel is rejected if Seer ever echoes it back as a result.
  • Sentinel. PullRequestVerdict.JUDGE_IN_PROGRESS — no migration (the verdict column is a CharField; Sentry's detector ignores choices changes).
  • Feature gated on organizations:pr-metrics-judge, independent of emission — a needs-judge PR keeps today's skip behavior until the flag is on.

Supporting refactors: promote the attribution/group collectors to public (shared by emission and the forward); promote build_repo_definition to public with an optional event_payload (the task-side forward has no webhook payload).

Contract sync

PrCloseJudgeRequest is a manual mirror of Seer's src/seer/pr_metrics/models.py (no shared package/codegen; both sides validate with pydantic, so drift surfaces loudly). The path /v1/pr-metrics/pr-close-judge matches the Seer route from CW-1436 (implemented).

Rollout & follow-ups

  • Flag stays off until rollout; the Seer judge currently consumes the deterministic-shaped fields. The activity timeline is forward-compatible — Seer adds an activity field to consume it (pairs with the real judges CW-1439 / CW-1440).
  • CORE-245 — capture additional PR webhooks (CI/check status, review dismissals, merge-queue). Net-new captures; they flow to the judge through the activity timeline once stored, no contract change.

Refs CORE-217

vaind and others added 2 commits June 12, 2026 12:01
The judge forward needs the same confidence-ordered attribution snapshot and
resolved group ids that emission builds. Promote the two collectors from
module-private to public so the forward path reuses the ordering rather than
duplicating it. No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
select_verdict settles the deterministic outcomes (merged_unchanged,
closed_unmerged) locally and returns None for everything that needs a judge.
Until now those PRs were dropped. Wire the Sentry -> Seer forward so they're
handed to Seer to judge, with Seer settling them via the existing
update_pr_metrics callback.

The webhook claims a JUDGE_IN_PROGRESS sentinel through the same compare-and-set
redelivery guard before enqueuing the forward, so a redelivered terminal event
can't forward twice. The forward itself is a blocking signed Seer call, so it
runs in a task off the webhook request path; a permanent (4xx) rejection is
observe-only for now (the row stays claimed and simply never emits), retryable
statuses retry. Gated on the new pr-metrics-judge flag independently of
emission, so a needs-judge PR keeps today's skip behavior until it's enabled.

Make the callback single-shot to match: only the callback that transitions the
row off the sentinel (or an unclaimed null) emits, so a retried Seer callback
settles to a no-op instead of a duplicate analytics row.

The Seer endpoint path is the one value that must match on both sides; it's
pending the sibling Seer route and easy to align once that lands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@linear-code

linear-code Bot commented Jun 12, 2026

Copy link
Copy Markdown

CORE-217

@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Jun 12, 2026
vaind and others added 4 commits June 12, 2026 12:59
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pace

Use /v1/pr-metrics/pr-close-judge rather than a code_review path; the judge
forward owns its own Seer route namespace.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…forward

The judge forward sent a flat repo dict with a combined "owner/repo" name, which
won't validate into Seer's RepoDefinition — it requires owner and name split, a
bare provider slug, and the base/org/privacy fields. Reuse code review's repo
definition builder so the forward emits the exact shape Seer already parses, with
no normalization on the Seer side.

Promote build_repo_definition to public and make its event_payload optional: the
forward runs in a task with no webhook payload, so it leaves is_private unknown.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Seer judge returns evidence/reasoning alongside the verdict, but
update_pr_metrics had no parameter for it, so Seer's callback couldn't send it
without the RPC dispatch raising on an unknown kwarg. Grow the callback to take
an optional verdict_details object, validate it at the trust boundary, and
persist it on PullRequestMetrics next to the verdict.

Carry it onto the emitted analytics row (as a JSON-encoded object) so the
judge's reasoning lands in BigQuery beside the verdict it explains — the row
schema is meant to grow this way. Null whenever the verdict is, i.e. on the
deterministic no-judge path and before a judge result arrives.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

This PR has a migration; here is the generated SQL for src/sentry/migrations/1115_pull_request_metrics_verdict_details.py

for 1115_pull_request_metrics_verdict_details in sentry

--
-- Add field verdict_details to pullrequestmetrics
--
ALTER TABLE "sentry_pullrequest_metrics" ADD COLUMN "verdict_details" jsonb NULL;

vaind and others added 5 commits June 12, 2026 14:42
… from Seer

This reverts commit 5a5b744.

Reason: The Seer judge isn't implemented end-to-end yet, so nothing produces the
reasoning and there's no confirmed need to keep it on Sentry's side. Defer
representing verdict_details here (column, callback param, analytics field) until
the judge ships and we know it's worth retaining.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The judge forward built its request as a bare dict[str, Any], so mypy couldn't
catch a renamed, dropped, or wrong-typed field — the body would only fail at
Seer's validation. Type it as a PrCloseJudgeRequest TypedDict, matching how
outbound Seer request bodies are typed in signed_seer_api, so contract drift
fails the build instead. No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pedDict

A TypedDict only types the body statically, and the repo sub-object — built by
build_repo_definition, which returns dict[str, Any] — stayed an unchecked hole.
That is exactly where the earlier owner/name shape mismatch lived, the kind of
bug a TypedDict can't catch.

Model the body as a pydantic PrCloseJudgeRequest reusing code review's
SeerCodeReviewRepoDefinition for the repo field, matching how the sibling
prevent-AI forwards (and Seer's own model) are defined. parse_obj now validates
the built repo shape before send, so contract drift fails here rather than as a
Seer-side rejection, and .json() handles serialization. No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The judge request sent only end-state aggregates (counts, terminal facts), which
flatten away the signals the judge most needs: who pushed the post-open commits
(Bot vs human — the merged_unchanged/with_iteration question), review outcomes,
labels, and draft transitions. All of that is already captured per-event in
PullRequestActivity.

Forward those rows as an ordered activity timeline. The stored payloads are
structural-only (titles/bodies/comment text are excluded at capture), so each is
safe to forward verbatim. This adds only data we already hold; capturing new
webhook types (CI/check status, review dismissals, merge-queue) is a follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vaind vaind marked this pull request as ready for review June 12, 2026 15:38
@vaind vaind requested review from a team as code owners June 12, 2026 15:38

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c80d8a8. Configure here.

"pr_metrics.update.already_settled", extra={**log_extra, "verdict": verdict}
)
metrics.incr("pr_metrics.update.skipped", tags={"reason": "already_settled"})
return {"success": True}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry skips emit after failure

Medium Severity

The single-emit guard treats any callback that finds an already-settled verdict as complete success and skips emit_pr_metrics_row. If the first callback commits the judged verdict but analytics emission fails afterward, a Seer retry sees already_settled and returns success without ever emitting the metrics row.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c80d8a8. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the mechanics, but this is a deliberate, documented tradeoff rather than a bug.

The verdict (the durable outcome) is committed in the transaction; emit_pr_metrics_row is best-effort, async-batched telemetry. A retry settling to already_settled preserves the single-emit invariant — re-emitting on retry is exactly the duplicate analytics row we guard against (the deterministic webhook path documents the identical "claim stands, telemetry row forgone on analytics.record failure — acceptable loss" tradeoff). Guaranteeing the emit would require tracking emit-success separately from the verdict, which we're intentionally not doing in this phase.

Leaving as-is by design.

Comment thread src/sentry/pr_metrics/webhooks.py Outdated
_forward_to_judge claims the JUDGE_IN_PROGRESS sentinel before enqueuing the
forward task. If the enqueue itself fails (e.g. broker unavailable), the claim
was committed but no task will ever settle the PR, and redeliveries can't
reclaim (the guard requires a null verdict) — so it sticks in judge_in_progress
forever.

Release the sentinel when delay() raises (only if it's still ours), so a webhook
redelivery re-forwards. This is the cheaply-recoverable case — distinct from a
task that exhausts retries or a callback Seer drops, which the deferred reaper
covers — and mirrors the "don't burn the guard unless we acted" rule already
used for untracked PRs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment on lines +326 to +336
settled = (
PullRequestMetrics.objects.filter(pull_request=pull_request)
.filter(Q(verdict=PullRequestVerdict.JUDGE_IN_PROGRESS) | Q(verdict__isnull=True))
.update(verdict=verdict)
)
if not settled:
logger.info(
"pr_metrics.update.already_settled", extra={**log_extra, "verdict": verdict}
)
metrics.incr("pr_metrics.update.skipped", tags={"reason": "already_settled"})
return {"success": True}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The call to emit_pr_metrics_row is outside the atomic transaction, causing permanent data loss if the emission fails, as retries will not re-emit.
Severity: MEDIUM

Suggested Fix

Move the emit_pr_metrics_row(pull_request=pr) call inside the transaction.atomic block. This ensures that an emission failure will roll back the entire transaction, allowing the operation to be retried correctly without data loss.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/pr_metrics/judge.py#L322-L336

Potential issue: The call to `emit_pr_metrics_row` in `src/sentry/pr_metrics/judge.py`
occurs after the atomic database transaction that commits the pull request verdict has
completed. If `emit_pr_metrics_row` raises an exception, for example due to a network
error during the `analytics.record` call, the process will halt. Because the verdict has
already been saved to the database, any subsequent retry will detect that the verdict is
settled and will not re-attempt to emit the analytics data. This results in the
permanent loss of the metrics row for that pull request.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +67 to +70
metrics.incr("pr_metrics.judge.forward_failed", tags={"reason": "repo_not_found"})
return

forward_pr_to_seer_judge(pull_request, repository)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The forward_pr_to_seer_judge task lacks exception handling for ValueError or ValidationError, causing PRs to become permanently stuck in JUDGE_IN_PROGRESS if these errors occur.
Severity: MEDIUM

Suggested Fix

Wrap the call to forward_pr_to_seer_judge within the task in a try...except block to catch ValueError and pydantic.ValidationError. In the except block, reset the PR's verdict from JUDGE_IN_PROGRESS to None to allow future retries or processing. Alternatively, add these exceptions to the task's retry decorator.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/pr_metrics/tasks.py#L67-L70

Potential issue: The `forward_pr_to_seer_judge` task is configured to retry only on
`HTTPError`. However, the task can fail due to unhandled exceptions like `ValueError`
from `build_repo_definition` (e.g., if a GitLab repository is missing its
`config['path']`) or `pydantic.ValidationError`. When such an unhandled exception
occurs, the task fails without retrying, and there is no mechanism to reset the pull
request's status. As a result, the pull request's verdict remains permanently stuck in
the `JUDGE_IN_PROGRESS` state, preventing it from ever being judged.

Also affects:

  • src/sentry/pr_metrics/judge.py:161~163

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant