feat(pr-metrics): Forward needs-judge terminal PR events to Seer#117512
feat(pr-metrics): Forward needs-judge terminal PR events to Seer#117512vaind wants to merge 12 commits into
Conversation
The judge forward needs the same confidence-ordered attribution snapshot and resolved group ids that emission builds. Promote the two collectors from module-private to public so the forward path reuses the ordering rather than duplicating it. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
select_verdict settles the deterministic outcomes (merged_unchanged, closed_unmerged) locally and returns None for everything that needs a judge. Until now those PRs were dropped. Wire the Sentry -> Seer forward so they're handed to Seer to judge, with Seer settling them via the existing update_pr_metrics callback. The webhook claims a JUDGE_IN_PROGRESS sentinel through the same compare-and-set redelivery guard before enqueuing the forward, so a redelivered terminal event can't forward twice. The forward itself is a blocking signed Seer call, so it runs in a task off the webhook request path; a permanent (4xx) rejection is observe-only for now (the row stays claimed and simply never emits), retryable statuses retry. Gated on the new pr-metrics-judge flag independently of emission, so a needs-judge PR keeps today's skip behavior until it's enabled. Make the callback single-shot to match: only the callback that transitions the row off the sentinel (or an unclaimed null) emits, so a retried Seer callback settles to a no-op instead of a duplicate analytics row. The Seer endpoint path is the one value that must match on both sides; it's pending the sibling Seer route and easy to align once that lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pace Use /v1/pr-metrics/pr-close-judge rather than a code_review path; the judge forward owns its own Seer route namespace. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…forward The judge forward sent a flat repo dict with a combined "owner/repo" name, which won't validate into Seer's RepoDefinition — it requires owner and name split, a bare provider slug, and the base/org/privacy fields. Reuse code review's repo definition builder so the forward emits the exact shape Seer already parses, with no normalization on the Seer side. Promote build_repo_definition to public and make its event_payload optional: the forward runs in a task with no webhook payload, so it leaves is_private unknown. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Seer judge returns evidence/reasoning alongside the verdict, but update_pr_metrics had no parameter for it, so Seer's callback couldn't send it without the RPC dispatch raising on an unknown kwarg. Grow the callback to take an optional verdict_details object, validate it at the trust boundary, and persist it on PullRequestMetrics next to the verdict. Carry it onto the emitted analytics row (as a JSON-encoded object) so the judge's reasoning lands in BigQuery beside the verdict it explains — the row schema is meant to grow this way. Null whenever the verdict is, i.e. on the deterministic no-judge path and before a judge result arrives. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
This PR has a migration; here is the generated SQL for for --
-- Add field verdict_details to pullrequestmetrics
--
ALTER TABLE "sentry_pullrequest_metrics" ADD COLUMN "verdict_details" jsonb NULL; |
… from Seer This reverts commit 5a5b744. Reason: The Seer judge isn't implemented end-to-end yet, so nothing produces the reasoning and there's no confirmed need to keep it on Sentry's side. Defer representing verdict_details here (column, callback param, analytics field) until the judge ships and we know it's worth retaining. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The judge forward built its request as a bare dict[str, Any], so mypy couldn't catch a renamed, dropped, or wrong-typed field — the body would only fail at Seer's validation. Type it as a PrCloseJudgeRequest TypedDict, matching how outbound Seer request bodies are typed in signed_seer_api, so contract drift fails the build instead. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pedDict A TypedDict only types the body statically, and the repo sub-object — built by build_repo_definition, which returns dict[str, Any] — stayed an unchecked hole. That is exactly where the earlier owner/name shape mismatch lived, the kind of bug a TypedDict can't catch. Model the body as a pydantic PrCloseJudgeRequest reusing code review's SeerCodeReviewRepoDefinition for the repo field, matching how the sibling prevent-AI forwards (and Seer's own model) are defined. parse_obj now validates the built repo shape before send, so contract drift fails here rather than as a Seer-side rejection, and .json() handles serialization. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The judge request sent only end-state aggregates (counts, terminal facts), which flatten away the signals the judge most needs: who pushed the post-open commits (Bot vs human — the merged_unchanged/with_iteration question), review outcomes, labels, and draft transitions. All of that is already captured per-event in PullRequestActivity. Forward those rows as an ordered activity timeline. The stored payloads are structural-only (titles/bodies/comment text are excluded at capture), so each is safe to forward verbatim. This adds only data we already hold; capturing new webhook types (CI/check status, review dismissals, merge-queue) is a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c80d8a8. Configure here.
| "pr_metrics.update.already_settled", extra={**log_extra, "verdict": verdict} | ||
| ) | ||
| metrics.incr("pr_metrics.update.skipped", tags={"reason": "already_settled"}) | ||
| return {"success": True} |
There was a problem hiding this comment.
Retry skips emit after failure
Medium Severity
The single-emit guard treats any callback that finds an already-settled verdict as complete success and skips emit_pr_metrics_row. If the first callback commits the judged verdict but analytics emission fails afterward, a Seer retry sees already_settled and returns success without ever emitting the metrics row.
Reviewed by Cursor Bugbot for commit c80d8a8. Configure here.
There was a problem hiding this comment.
Good catch on the mechanics, but this is a deliberate, documented tradeoff rather than a bug.
The verdict (the durable outcome) is committed in the transaction; emit_pr_metrics_row is best-effort, async-batched telemetry. A retry settling to already_settled preserves the single-emit invariant — re-emitting on retry is exactly the duplicate analytics row we guard against (the deterministic webhook path documents the identical "claim stands, telemetry row forgone on analytics.record failure — acceptable loss" tradeoff). Guaranteeing the emit would require tracking emit-success separately from the verdict, which we're intentionally not doing in this phase.
Leaving as-is by design.
_forward_to_judge claims the JUDGE_IN_PROGRESS sentinel before enqueuing the forward task. If the enqueue itself fails (e.g. broker unavailable), the claim was committed but no task will ever settle the PR, and redeliveries can't reclaim (the guard requires a null verdict) — so it sticks in judge_in_progress forever. Release the sentinel when delay() raises (only if it's still ours), so a webhook redelivery re-forwards. This is the cheaply-recoverable case — distinct from a task that exhausts retries or a callback Seer drops, which the deferred reaper covers — and mirrors the "don't burn the guard unless we acted" rule already used for untracked PRs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| settled = ( | ||
| PullRequestMetrics.objects.filter(pull_request=pull_request) | ||
| .filter(Q(verdict=PullRequestVerdict.JUDGE_IN_PROGRESS) | Q(verdict__isnull=True)) | ||
| .update(verdict=verdict) | ||
| ) | ||
| if not settled: | ||
| logger.info( | ||
| "pr_metrics.update.already_settled", extra={**log_extra, "verdict": verdict} | ||
| ) | ||
| metrics.incr("pr_metrics.update.skipped", tags={"reason": "already_settled"}) | ||
| return {"success": True} |
There was a problem hiding this comment.
Bug: The call to emit_pr_metrics_row is outside the atomic transaction, causing permanent data loss if the emission fails, as retries will not re-emit.
Severity: MEDIUM
Suggested Fix
Move the emit_pr_metrics_row(pull_request=pr) call inside the transaction.atomic block. This ensures that an emission failure will roll back the entire transaction, allowing the operation to be retried correctly without data loss.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/pr_metrics/judge.py#L322-L336
Potential issue: The call to `emit_pr_metrics_row` in `src/sentry/pr_metrics/judge.py`
occurs after the atomic database transaction that commits the pull request verdict has
completed. If `emit_pr_metrics_row` raises an exception, for example due to a network
error during the `analytics.record` call, the process will halt. Because the verdict has
already been saved to the database, any subsequent retry will detect that the verdict is
settled and will not re-attempt to emit the analytics data. This results in the
permanent loss of the metrics row for that pull request.
Did we get this right? 👍 / 👎 to inform future reviews.
| metrics.incr("pr_metrics.judge.forward_failed", tags={"reason": "repo_not_found"}) | ||
| return | ||
|
|
||
| forward_pr_to_seer_judge(pull_request, repository) |
There was a problem hiding this comment.
Bug: The forward_pr_to_seer_judge task lacks exception handling for ValueError or ValidationError, causing PRs to become permanently stuck in JUDGE_IN_PROGRESS if these errors occur.
Severity: MEDIUM
Suggested Fix
Wrap the call to forward_pr_to_seer_judge within the task in a try...except block to catch ValueError and pydantic.ValidationError. In the except block, reset the PR's verdict from JUDGE_IN_PROGRESS to None to allow future retries or processing. Alternatively, add these exceptions to the task's retry decorator.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/pr_metrics/tasks.py#L67-L70
Potential issue: The `forward_pr_to_seer_judge` task is configured to retry only on
`HTTPError`. However, the task can fail due to unhandled exceptions like `ValueError`
from `build_repo_definition` (e.g., if a GitLab repository is missing its
`config['path']`) or `pydantic.ValidationError`. When such an unhandled exception
occurs, the task fails without retrying, and there is no mechanism to reset the pull
request's status. As a result, the pull request's verdict remains permanently stuck in
the `JUDGE_IN_PROGRESS` state, preventing it from ever being judged.
Also affects:
src/sentry/pr_metrics/judge.py:161~163
Did we get this right? 👍 / 👎 to inform future reviews.


Problem
select_verdictsettles the deterministic PR outcomes (merged_unchanged,closed_unmerged) locally and returnsNonefor everything that needs a judge — a merge with later commits, a close with engagement, or a missing metrics row. Those needs-judge PRs were previously dropped: logged and skipped, never recorded.What this does
Wires the Sentry → Seer judge round-trip (CORE-217). When a tracked terminal event needs a judge, Sentry forwards it to Seer, and Seer settles it via the existing
update_pr_metricscallback.JUDGE_IN_PROGRESSsentinel via the same compare-and-set redelivery guard before enqueuing, so a redelivered terminal event can't forward twice (tolerant of a missing metrics row — the defer-to-judge case). The forward runs in theforward_pr_to_seertask off the request path (the signed Seer call is blocking); 5xx/429 retry, a permanent 4xx is observe-only (the row stays claimed and never emits — no reaper this phase).PrCloseJudgeRequest(typed so contract drift fails the build, not at Seer).reporeuses the shared SeerRepoDefinitionviabuild_repo_definition(split owner/name, bare provider slug). The body also carries the capturedPullRequestActivitytimeline — the event sequence and actors (Bot vs human pushes, review outcomes, labels, draft transitions) that the end-state counters flatten away. Payloads are structural-only (no titles/bodies/comment text).update_pr_metricsonly emits on the transition off the sentinel (or an unclaimed null), so a retried Seer callback settles to a no-op rather than a duplicate analytics row. The sentinel is rejected if Seer ever echoes it back as a result.PullRequestVerdict.JUDGE_IN_PROGRESS— no migration (theverdictcolumn is aCharField; Sentry's detector ignoreschoiceschanges).organizations:pr-metrics-judge, independent of emission — a needs-judge PR keeps today's skip behavior until the flag is on.Supporting refactors: promote the attribution/group collectors to public (shared by emission and the forward); promote
build_repo_definitionto public with an optionalevent_payload(the task-side forward has no webhook payload).Contract sync
PrCloseJudgeRequestis a manual mirror of Seer'ssrc/seer/pr_metrics/models.py(no shared package/codegen; both sides validate with pydantic, so drift surfaces loudly). The path/v1/pr-metrics/pr-close-judgematches the Seer route from CW-1436 (implemented).Rollout & follow-ups
activitytimeline is forward-compatible — Seer adds anactivityfield to consume it (pairs with the real judges CW-1439 / CW-1440).Refs CORE-217