Skip to content

DRAFT: Add AI-powered issue triage agent (dry-run, opt-in)#40525

Draft
benhillis wants to merge 4 commits into
microsoft:masterfrom
benhillis:ai-triage-agent
Draft

DRAFT: Add AI-powered issue triage agent (dry-run, opt-in)#40525
benhillis wants to merge 4 commits into
microsoft:masterfrom
benhillis:ai-triage-agent

Conversation

@benhillis
Copy link
Copy Markdown
Member

What

Adds an opt-in, AI-powered issue triage agent that complements the existing rule-based wti.exe pipeline. Purely additivewti and its workflows (new_issue.yml, new_issue_comment.yml, issue_edited.yml) are completely untouched.

For a designated issue, the agent reads the title and body, asks an LLM (via GitHub Models) to classify it and surface possible duplicates, then posts a single collapsible maintainer-facing comment with its analysis.

What it does

  1. Fetches the issue via gh api.
  2. Runs skip rules (closed / locked / bot author / maintainer author / body < 50 chars / already-triaged-and-unchanged).
  3. Pre-fetches ~15 duplicate candidates via gh search issues (the model picks from this list — it cannot invent issue numbers).
  4. Calls openai/gpt-4o-mini via the gh-models extension with a version-controlled prompt.
  5. Validates and clamps the JSON response against fixed allowlists (issue type, component labels, candidate numbers).
  6. Sanitizes the model's prose (HTML-stripped, URLs stripped, @mentions defanged with U+200B; emails preserved by negative lookbehind).
  7. Re-fetches the issue and re-hashes to detect slow-run-vs-newer-edit races, then upserts a single comment keyed by an HTML-comment marker so re-runs replace rather than duplicate.

What it does NOT do (v1)

  • No labels are added or removed. Output is suggestion-only.
  • No closing, locking, reassigning, or @-mentioning.
  • No re-trigger on issue edits, comments, or PRs.

Example of the resulting comment:

🤖 AI triage summary (suggestions, dry-run — not auto-applied)
Summary: The user reports being unable to start WSL2 after a crash, receiving an HCS/ERROR_FILE_NOT_FOUND message …
Suggested type: bug
Suggested component labels: wsl2, install
Possible duplicates: #12753, #11253

Initial rollout: manual-dispatch only

The workflow's issues.opened trigger is commented out in this PR. The only way to invoke it is Actions → AI issue triage (dry-run) → Run workflow → enter issue number. This lets the team vet output quality on hand-picked real issues before opening the firehose. To enable automatic triage on every new issue later, uncomment the issues: block in .github/workflows/ai_triage.yml.

Suggested rollout:

  1. Land this PR.
  2. Have maintainers manually dispatch on ~20 fresh issues over a week. Read the comments. Tune triage/ai/prompt.md if needed.
  3. If the signal-to-noise looks good, flip on issues.opened.
  4. Per the graduation plan in triage/ai/README.md, eventually consider v2 auto-labeling for high-confidence subsets.

Security / abuse posture

  • Issue body is untrusted input. The prompt explicitly tells the model to ignore instructions inside it.
  • All model output is JSON-schema-validated. Type and label suggestions are intersected with both a static allowlist and the live gh label list output. Duplicate numbers must appear in the pre-fetched candidate set.
  • Summary text is HTML-tag-stripped, URL-stripped, mention-defanged, length-capped at 400 chars, and html.escape()'d during render. Renders inside a <details> collapsed by default.
  • Skips bot authors (by author.type == "Bot" and [bot] login suffix) and maintainer-level author_association (OWNER / MEMBER / COLLABORATOR).
  • Idempotency marker <!-- ai-triage:v1 input-sha=… prompt-sha=… --> is keyed on (title, body, prompt-version), re-checked after the model call.
  • concurrency: cancel-in-progress per issue prevents pile-ups on rapid edits.
  • Failure model is two-tiered: model / network / JSON errors silently exit 0 (no comment posted, workflow green); permission errors and unexpected exceptions exit 1 (workflow red, maintainers see it).

Permissions

permissions:
  issues: write     # post the triage comment
  models: read      # GitHub Models inference (per actions/ai-inference docs)
  contents: read    # checkout

Tests

triage/ai/test_ai_triage.py82 unit tests, no network, ~0.2 s. Covers JSON extraction, validation/clamping, sanitization (including HTML / URL / mention / drift guards comparing the Python allowlists to the prompt template), idempotency hashing, marker parsing, comment rendering, and skip rules. Wired into CI via .github/workflows/ai_triage_tests.yml which runs on any PR touching triage/ai/** or .github/workflows/ai_triage*.yml.

End-to-end smoke testing was done from benhillis/WSL:

Files

Path Purpose
.github/workflows/ai_triage.yml Main workflow (manual-dispatch only initially)
.github/workflows/ai_triage_tests.yml Pytest CI gate on triage/ai changes
triage/ai/ai_triage.py Agent script (~750 lines)
triage/ai/prompt.md Version-controlled LLM prompt template
triage/ai/test_ai_triage.py 82 unit tests, no network
triage/ai/README.md Design rationale, local-run guide, graduation plan
triage/ai/.gitignore __pycache__ / .pytest_cache

Cost

Manual-dispatch only at first → effectively zero. Once issues.opened is enabled, one gpt-4o-mini call per non-skipped new issue (typical ~3–5K input tokens, ~300 output tokens). Within GitHub Models free quota for current WSL issue volume.

How to disable / roll back

  • Disable: in repo settings, Actions → Workflows → AI issue triage (dry-run) → Disable workflow. Or delete .github/workflows/ai_triage.yml.
  • Nothing else in the repo depends on these files.

Local run

cd triage/ai
python ai_triage.py --issue 12345 --dry-run

Requires gh CLI authenticated, and gh extension install github/gh-models.


Opening as draft for design discussion before merge. Happy to split into multiple PRs (workflow + script + tests + docs) if that's preferred for review.

Ben Hillis and others added 3 commits May 13, 2026 08:22
Adds a complementary triage agent for new issues that runs in parallel with
the existing rule-based wti pipeline. Reads the issue prose, asks an LLM via
GitHub Models to classify it, and posts a single collapsible maintainer-
facing comment with: a 1-3 sentence summary, suggested issue type, suggested
component labels, missing bug-template fields, and possible duplicate issues
(retrieved via gh search issues, never invented by the model).

v1 is dry-run only -- no labels are applied, no issue state is changed.

The model is treated as untrusted: JSON output is schema-validated, labels
are intersected with both a static allowlist and the live label list,
duplicate numbers are intersected with the pre-fetched candidate set, and
the summary is HTML-escaped with markdown links / URLs / @mentions stripped.
Idempotency uses an input-sha hash embedded in the marker comment, with a
post-call re-fetch to defeat slow-run-vs-newer-edit races.

Failures (model error, malformed JSON, rate limit) are silent: log to stderr,
exit 0, no comment posted.

Files:
- .github/workflows/ai_triage.yml       - workflow on issues.opened
- .github/workflows/ai_triage_tests.yml - pytest CI gate on PRs touching the script
- triage/ai/ai_triage.py                - the agent script
- triage/ai/prompt.md                   - version-controlled prompt template
- triage/ai/test_ai_triage.py           - 79 pure-function unit tests
- triage/ai/README.md                   - design + run instructions + graduation plan
- triage/ai/.gitignore                  - exclude __pycache__/.pytest_cache

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- sanitize_summary now strips raw HTML tags (defense-in-depth; the function name implied this and html.escape() in render_comment was the only thing preventing tags from flowing through)

- main() now exits 1 on unexpected exceptions instead of swallowing them; expected silent paths (model errors, transient gh API errors on read) still exit 0 inline. Comment-upsert failures now propagate, so a 403/5xx from posting fails the workflow visibly instead of showing green.

- Workflow concurrency group falls back to github.run_id if both event payload and inputs are missing, preventing a collapsed-group deadlock.

- REPO is now overridable via AI_TRIAGE_REPO env var so fork testing can target the fork's own issues without write-blocking 403s.

- Updated README failure-mode section to document the silent-vs-loud split.

- Added 3 sanitize_summary tests for HTML stripping (script tag, attribute strip, HTML comment).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Manual workflow_dispatch only until maintainers validate output quality on real issues. Re-enable by uncommenting the issues: trigger block.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 13, 2026 16:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new, opt-in AI-powered issue triage agent under triage/ai/ plus GitHub Actions workflows to run it manually (initially) and to gate changes with unit tests. This complements (and does not modify) the existing rule-based triage tooling.

Changes:

  • Introduces triage/ai/ai_triage.py to fetch an issue, retrieve duplicate candidates, call GitHub Models, validate/clamp results, and upsert a single maintainer-facing comment.
  • Adds a version-controlled prompt (triage/ai/prompt.md) and unit tests (triage/ai/test_ai_triage.py) focused on validation/sanitization/idempotency behavior.
  • Adds Actions workflows for manual dispatch triage and for running pytest on PRs touching triage/ai/**.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
triage/ai/ai_triage.py Core triage agent: issue fetch, candidate retrieval, model call, JSON extraction/validation, sanitization, comment upsert.
triage/ai/prompt.md Prompt template and strict output schema/allowlists for the model.
triage/ai/test_ai_triage.py Unit tests covering parsing, validation/clamping, sanitization, hashing/marker behavior, and skip rules.
triage/ai/README.md Design/usage documentation and operational posture (skip rules, failure modes, rollout plan).
triage/ai/.gitignore Ignores Python cache and pytest artifacts for the new triage module.
.github/workflows/ai_triage.yml Manual-dispatch workflow to run the triage agent and post/update the comment.
.github/workflows/ai_triage_tests.yml CI workflow running pytest for triage/ai/** changes.

Comment thread triage/ai/ai_triage.py Outdated
Comment thread triage/ai/ai_triage.py Outdated
Comment thread .github/workflows/ai_triage.yml Outdated
Comment thread triage/ai/README.md Outdated
Comment thread triage/ai/ai_triage.py Outdated
@benhillis benhillis changed the title Add AI-powered issue triage agent (dry-run, opt-in) DRAFT: Add AI-powered issue triage agent (dry-run, opt-in) May 13, 2026
- render_prompt: switch from sequential .replace() to single-pass re.sub. Sequential replacement let an issue body containing '{{CANDIDATES_JSON}}' (or any other placeholder token) get rewritten by a later substitution, giving the issue author a way to alter the prompt. Closes the inline review thread on ai_triage.py:331.

- find_existing_marker_comment: paginate up to 10 pages (1000 comments) instead of fetching only the first 100. Stops on first marker hit, on a short page (last page), or at the safety cap. Closes the inline thread on ai_triage.py:575.

- Workflow input 'issue' is now type: number so non-numeric values are rejected at the dispatch UI instead of crashing the script.

- README files-table updated to reflect the manual-only initial rollout.

- Removed dead _JSON_OBJECT_RE constant left over from when extract_json_object used a regex.

- Added 9 unit tests: 4 for render_prompt (basic substitution + 2 prompt-injection regression tests + unknown-placeholder passthrough) and 5 for find_existing_marker_comment (page-1 hit, page-2 hit, short-page stop, empty-page stop, cap).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@benhillis
Copy link
Copy Markdown
Member Author

Thanks for the review! All 5 findings addressed in bad3a6e.

# Finding Resolution
1 render_prompt sequential .replace() lets issue body inject placeholders Replaced with single-pass re.sub over a placeholder→value map. An issue body containing {{CANDIDATES_JSON}} is now treated as literal text. Added 2 regression tests.
2 find_existing_marker_comment only fetches first 100 comments Now paginates newest-first, capped at 10 pages (1000 comments). Stops on first marker hit, short page, or safety cap. Added 5 unit tests covering page-1 hit, page-2 hit, short-page stop, empty-page stop, and cap.
3 workflow_dispatch issue input declared as string but script needs int Changed to type: number.
4 README claims issues.opened trigger is active Updated to reflect the manual-only initial rollout.
5 _JSON_OBJECT_RE dead code Removed.

Tests: 91 pass (was 82). All ~0.2s, no network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants