DRAFT: Add AI-powered issue triage agent (dry-run, opt-in) by benhillis · Pull Request #40525 · microsoft/WSL

benhillis · 2026-05-13T16:17:45Z

What

Adds an opt-in, AI-powered issue triage agent that complements the existing rule-based wti.exe pipeline. Purely additive — wti and its workflows (new_issue.yml, new_issue_comment.yml, issue_edited.yml) are completely untouched.

For a designated issue, the agent reads the title and body, asks an LLM (via GitHub Models) to classify it and surface possible duplicates, then posts a single collapsible maintainer-facing comment with its analysis.

What it does

Fetches the issue via gh api.
Runs skip rules (closed / locked / bot author / maintainer author / body < 50 chars / already-triaged-and-unchanged).
Pre-fetches ~15 duplicate candidates via gh search issues (the model picks from this list — it cannot invent issue numbers).
Calls openai/gpt-4o-mini via the gh-models extension with a version-controlled prompt.
Validates and clamps the JSON response against fixed allowlists (issue type, component labels, candidate numbers).
Sanitizes the model's prose (HTML-stripped, URLs stripped, @mentions defanged with U+200B; emails preserved by negative lookbehind).
Re-fetches the issue and re-hashes to detect slow-run-vs-newer-edit races, then upserts a single comment keyed by an HTML-comment marker so re-runs replace rather than duplicate.

What it does NOT do (v1)

No labels are added or removed. Output is suggestion-only.
No closing, locking, reassigning, or @-mentioning.
No re-trigger on issue edits, comments, or PRs.

Example of the resulting comment:

🤖 AI triage summary (suggestions, dry-run — not auto-applied)
Summary: The user reports being unable to start WSL2 after a crash, receiving an HCS/ERROR_FILE_NOT_FOUND message …
Suggested type: bug
Suggested component labels: wsl2, install
Possible duplicates: #12753, #11253

Initial rollout: manual-dispatch only

The workflow's issues.opened trigger is commented out in this PR. The only way to invoke it is Actions → AI issue triage (dry-run) → Run workflow → enter issue number. This lets the team vet output quality on hand-picked real issues before opening the firehose. To enable automatic triage on every new issue later, uncomment the issues: block in .github/workflows/ai_triage.yml.

Suggested rollout:

Land this PR.
Have maintainers manually dispatch on ~20 fresh issues over a week. Read the comments. Tune triage/ai/prompt.md if needed.
If the signal-to-noise looks good, flip on issues.opened.
Per the graduation plan in triage/ai/README.md, eventually consider v2 auto-labeling for high-confidence subsets.

Security / abuse posture

Issue body is untrusted input. The prompt explicitly tells the model to ignore instructions inside it.
All model output is JSON-schema-validated. Type and label suggestions are intersected with both a static allowlist and the live gh label list output. Duplicate numbers must appear in the pre-fetched candidate set.
Summary text is HTML-tag-stripped, URL-stripped, mention-defanged, length-capped at 400 chars, and html.escape()'d during render. Renders inside a <details> collapsed by default.
Skips bot authors (by author.type == "Bot" and [bot] login suffix) and maintainer-level author_association (OWNER / MEMBER / COLLABORATOR).
Idempotency marker  is keyed on (title, body, prompt-version), re-checked after the model call.
concurrency: cancel-in-progress per issue prevents pile-ups on rapid edits.
Failure model is two-tiered: model / network / JSON errors silently exit 0 (no comment posted, workflow green); permission errors and unexpected exceptions exit 1 (workflow red, maintainers see it).

Permissions

permissions:
  issues: write     # post the triage comment
  models: read      # GitHub Models inference (per actions/ai-inference docs)
  contents: read    # checkout

Tests

triage/ai/test_ai_triage.py — 82 unit tests, no network, ~0.2 s. Covers JSON extraction, validation/clamping, sanitization (including HTML / URL / mention / drift guards comparing the Python allowlists to the prompt template), idempotency hashing, marker parsing, comment rendering, and skip rules. Wired into CI via .github/workflows/ai_triage_tests.yml which runs on any PR touching triage/ai/** or .github/workflows/ai_triage*.yml.

End-to-end smoke testing was done from benhillis/WSL:

Dry-run against historical issue Cannot start WSL: Wsl/Service/CreateInstance/CreateVm/MountVhd/HCS/ERROR_FILE_NOT_FOUND #40488 (HCS error) → correctly classified as bug / wsl2+install and surfaced Cannot start WSL: Wsl/Service/CreateInstance/CreateVm/MountVhd/HCS/ERROR_FILE_NOT_FOUND #12753 and wsl startup failed after windows update Error code: Wsl/Service/CreateInstance/CreateVm/MountVhd/HCS/ERROR_NOT_FOUND #11253 as real duplicates.
Dry-run against "wsl --manage --print-status" '/feature' #40483 → correctly classified as feature / distro-mgmt.
Workflow-level dispatch on the fork → all stages succeeded (gh extension install github/gh-models, models: read permission resolution, model call, JSON parsing, validation, dedup retrieval, staleness re-check). Comment-upsert correctly failed loudly with HTTP 403 because the fork token has no write access to the upstream repo.

Files

Path	Purpose
`.github/workflows/ai_triage.yml`	Main workflow (manual-dispatch only initially)
`.github/workflows/ai_triage_tests.yml`	Pytest CI gate on triage/ai changes
`triage/ai/ai_triage.py`	Agent script (~750 lines)
`triage/ai/prompt.md`	Version-controlled LLM prompt template
`triage/ai/test_ai_triage.py`	82 unit tests, no network
`triage/ai/README.md`	Design rationale, local-run guide, graduation plan
`triage/ai/.gitignore`	`__pycache__` / `.pytest_cache`

Cost

Manual-dispatch only at first → effectively zero. Once issues.opened is enabled, one gpt-4o-mini call per non-skipped new issue (typical ~3–5K input tokens, ~300 output tokens). Within GitHub Models free quota for current WSL issue volume.

How to disable / roll back

Disable: in repo settings, Actions → Workflows → AI issue triage (dry-run) → Disable workflow. Or delete .github/workflows/ai_triage.yml.
Nothing else in the repo depends on these files.

Local run

cd triage/ai
python ai_triage.py --issue 12345 --dry-run

Requires gh CLI authenticated, and gh extension install github/gh-models.

Opening as draft for design discussion before merge. Happy to split into multiple PRs (workflow + script + tests + docs) if that's preferred for review.

Adds a complementary triage agent for new issues that runs in parallel with the existing rule-based wti pipeline. Reads the issue prose, asks an LLM via GitHub Models to classify it, and posts a single collapsible maintainer- facing comment with: a 1-3 sentence summary, suggested issue type, suggested component labels, missing bug-template fields, and possible duplicate issues (retrieved via gh search issues, never invented by the model). v1 is dry-run only -- no labels are applied, no issue state is changed. The model is treated as untrusted: JSON output is schema-validated, labels are intersected with both a static allowlist and the live label list, duplicate numbers are intersected with the pre-fetched candidate set, and the summary is HTML-escaped with markdown links / URLs / @mentions stripped. Idempotency uses an input-sha hash embedded in the marker comment, with a post-call re-fetch to defeat slow-run-vs-newer-edit races. Failures (model error, malformed JSON, rate limit) are silent: log to stderr, exit 0, no comment posted. Files: - .github/workflows/ai_triage.yml - workflow on issues.opened - .github/workflows/ai_triage_tests.yml - pytest CI gate on PRs touching the script - triage/ai/ai_triage.py - the agent script - triage/ai/prompt.md - version-controlled prompt template - triage/ai/test_ai_triage.py - 79 pure-function unit tests - triage/ai/README.md - design + run instructions + graduation plan - triage/ai/.gitignore - exclude __pycache__/.pytest_cache Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- sanitize_summary now strips raw HTML tags (defense-in-depth; the function name implied this and html.escape() in render_comment was the only thing preventing tags from flowing through) - main() now exits 1 on unexpected exceptions instead of swallowing them; expected silent paths (model errors, transient gh API errors on read) still exit 0 inline. Comment-upsert failures now propagate, so a 403/5xx from posting fails the workflow visibly instead of showing green. - Workflow concurrency group falls back to github.run_id if both event payload and inputs are missing, preventing a collapsed-group deadlock. - REPO is now overridable via AI_TRIAGE_REPO env var so fork testing can target the fork's own issues without write-blocking 403s. - Updated README failure-mode section to document the silent-vs-loud split. - Added 3 sanitize_summary tests for HTML stripping (script tag, attribute strip, HTML comment). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Manual workflow_dispatch only until maintainers validate output quality on real issues. Re-enable by uncommenting the issues: trigger block. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new, opt-in AI-powered issue triage agent under triage/ai/ plus GitHub Actions workflows to run it manually (initially) and to gate changes with unit tests. This complements (and does not modify) the existing rule-based triage tooling.

Changes:

Introduces triage/ai/ai_triage.py to fetch an issue, retrieve duplicate candidates, call GitHub Models, validate/clamp results, and upsert a single maintainer-facing comment.
Adds a version-controlled prompt (triage/ai/prompt.md) and unit tests (triage/ai/test_ai_triage.py) focused on validation/sanitization/idempotency behavior.
Adds Actions workflows for manual dispatch triage and for running pytest on PRs touching triage/ai/**.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`triage/ai/ai_triage.py`	Core triage agent: issue fetch, candidate retrieval, model call, JSON extraction/validation, sanitization, comment upsert.
`triage/ai/prompt.md`	Prompt template and strict output schema/allowlists for the model.
`triage/ai/test_ai_triage.py`	Unit tests covering parsing, validation/clamping, sanitization, hashing/marker behavior, and skip rules.
`triage/ai/README.md`	Design/usage documentation and operational posture (skip rules, failure modes, rollout plan).
`triage/ai/.gitignore`	Ignores Python cache and pytest artifacts for the new triage module.
`.github/workflows/ai_triage.yml`	Manual-dispatch workflow to run the triage agent and post/update the comment.
`.github/workflows/ai_triage_tests.yml`	CI workflow running pytest for `triage/ai/**` changes.

- render_prompt: switch from sequential .replace() to single-pass re.sub. Sequential replacement let an issue body containing '{{CANDIDATES_JSON}}' (or any other placeholder token) get rewritten by a later substitution, giving the issue author a way to alter the prompt. Closes the inline review thread on ai_triage.py:331. - find_existing_marker_comment: paginate up to 10 pages (1000 comments) instead of fetching only the first 100. Stops on first marker hit, on a short page (last page), or at the safety cap. Closes the inline thread on ai_triage.py:575. - Workflow input 'issue' is now type: number so non-numeric values are rejected at the dispatch UI instead of crashing the script. - README files-table updated to reflect the manual-only initial rollout. - Removed dead _JSON_OBJECT_RE constant left over from when extract_json_object used a regex. - Added 9 unit tests: 4 for render_prompt (basic substitution + 2 prompt-injection regression tests + unknown-placeholder passthrough) and 5 for find_existing_marker_comment (page-1 hit, page-2 hit, short-page stop, empty-page stop, cap). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

benhillis · 2026-05-13T17:42:11Z

Thanks for the review! All 5 findings addressed in bad3a6e.

#	Finding	Resolution
1	`render_prompt` sequential `.replace()` lets issue body inject placeholders	Replaced with single-pass `re.sub` over a placeholder→value map. An issue body containing `{{CANDIDATES_JSON}}` is now treated as literal text. Added 2 regression tests.
2	`find_existing_marker_comment` only fetches first 100 comments	Now paginates newest-first, capped at 10 pages (1000 comments). Stops on first marker hit, short page, or safety cap. Added 5 unit tests covering page-1 hit, page-2 hit, short-page stop, empty-page stop, and cap.
3	`workflow_dispatch` `issue` input declared as `string` but script needs `int`	Changed to `type: number`.
4	README claims `issues.opened` trigger is active	Updated to reflect the manual-only initial rollout.
5	`_JSON_OBJECT_RE` dead code	Removed.

Tests: 91 pass (was 82). All ~0.2s, no network.

Ben Hillis and others added 3 commits May 13, 2026 08:22

Disable issues.opened trigger for initial rollout

5ac25e3

Manual workflow_dispatch only until maintainers validate output quality on real issues. Re-enable by uncommenting the issues: trigger block. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 13, 2026 16:17

Copilot started reviewing on behalf of benhillis May 13, 2026 16:18 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread triage/ai/ai_triage.py Outdated

Comment thread triage/ai/ai_triage.py Outdated

Comment thread .github/workflows/ai_triage.yml Outdated

Comment thread triage/ai/README.md Outdated

Comment thread triage/ai/ai_triage.py Outdated

benhillis changed the title ~~Add AI-powered issue triage agent (dry-run, opt-in)~~ DRAFT: Add AI-powered issue triage agent (dry-run, opt-in) May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: Add AI-powered issue triage agent (dry-run, opt-in)#40525

DRAFT: Add AI-powered issue triage agent (dry-run, opt-in)#40525
benhillis wants to merge 4 commits into
microsoft:masterfrom
benhillis:ai-triage-agent

benhillis commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benhillis commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

benhillis commented May 13, 2026

What

What it does

What it does NOT do (v1)

Initial rollout: manual-dispatch only

Security / abuse posture

Permissions

Tests

Files

Cost

How to disable / roll back

Local run

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benhillis commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants