feat: gstack-inspired AI PR review pipeline with HuggingFace triage, Claude review, and auto-merge by schneidermr · Pull Request #305 · garrytan/gstack

schneidermr · 2026-03-21T20:28:41Z

Summary

Adds a 4-step AI-powered PR review pipeline to GitHub Actions, inspired by
gstack's /review and /plan-eng-review
skill files. The pipeline classifies, reviews, scores, and optionally merges PRs
— fully automated, with structured JSON output and a complete audit trail.

Motivation

AI-assisted code generation is accelerating faster than teams can review it.
Anthropic's Claude Code Review solves this at the enterprise tier ($15–25/review,
20 min, Teams/Enterprise only). This pipeline brings comparable structured
review to any GitHub repo at a fraction of the cost ($0.10–0.30/review)
by combining a lightweight open-source triage model with Claude's reasoning
capabilities and gstack's battle-tested review principles.

Architecture

Step 1: Triage      → Qwen2.5-3B (HuggingFace Inference API)
Step 2: Compile     → Claude Sonnet (prompt compilation)
Step 3: Review      → Claude Sonnet/Opus (structured code review)
Step 4: Route       → Deterministic bash (approve/reject/comment/merge)

Step 1 — Triage (HuggingFace, ~2-5s, ~free)

Gathers full PR context via GitHub API: diff, inline review comments,
conversation thread, and linked issues. Classifies the PR by type, risk,
size, and review depth using Qwen2.5-3B-Instruct. Falls back to
rule-based heuristics if the HF API is unavailable.

Step 2 — Prompt Compilation (Claude Sonnet, ~10-15s, ~$0.01-0.03)

Reads the actual gstack skill files (review/SKILL.md and
plan-eng-review/SKILL.md) from the main branch, plus the triage
output. Following the compile-instructions.md meta-prompt, Claude
strips interactive patterns, extracts the review principles, and
compiles a tailored single-pass review prompt optimized for this
specific PR's type, risk level, and review context.

Step 3 — Deep Review (Claude Sonnet, ~30-90s, ~$0.05-0.30)

Executes the compiled prompt against the actual PR diff. Produces a
structured JSON result with 5-dimension scores (design, security,
performance, test coverage, completeness), severity-classified findings
with file/line references, and a verdict.

Step 4 — Action Routing (bash, ~1-2s, free)

Pure deterministic logic — reads the review JSON and triage output,
then calls GitHub API to:

Approve, request changes, or post review comments
Add/remove labels (ai-approved, needs-work, security-review-needed, etc.)
Enable auto-merge for qualifying low-risk PRs (gated by AUTO_MERGE_ENABLED repo variable)
Escalate critical security findings

Decision Matrix

Condition	Action
Score ≥ 9, no critical/major, auto-mergeable	Approve + auto-merge (if enabled)
Score ≥ 7, no critical findings	Approve + `ai-review-passed`
Moderate issues, non-blocking	Comment only + `needs-human-review`
Critical or multiple major findings	Request changes + `needs-work`
Confidence < 0.7	Comment only + escalate to human
Any critical security finding	Always block + `security-review-needed`

Key Design Decisions

Multi-model pipeline: Uses a cheap, fast model (Qwen 2.5 3B) for
classification and an expensive, capable model (Claude) for reasoning.
This keeps costs ~50-100x lower than Claude Code Review.
gstack skill adaptation: Review principles are read directly from the
actual review/SKILL.md and plan-eng-review/SKILL.md on the main branch
— not static copies. A compile-instructions.md meta-prompt tells Claude how
to strip interactive patterns and compile them into a headless CI review prompt.
When gstack updates its skills, the pipeline automatically picks up the changes.
Structured JSON contract: The review schema (review-schema.json)
is the formal interface between the LLM and the automation layer.
Every review decision is a downloadable artifact for audit.
Auto-merge toggle: AUTO_MERGE_ENABLED repo variable (default: false)
provides a global kill switch. Even when enabled, auto-merge requires:
score ≥ 9, no critical/major findings, triage approval, AND all other
CI checks passing.
Graceful degradation: If HuggingFace is down, triage falls back to
heuristics. The pipeline never fails silently at classification.

Files

.github/
├── workflows/
│    └─ gstack-pr-review.yml          # Main 4-step workflow
└── gstack-review/
    ├── triage.py                      # Step 1: HF Qwen triage classifier
    ├── route-action.sh                # Step 4: Deterministic action routing
    ├── compile-instructions.md        # Step 2: Meta-prompt for skill compilation
    └── review-schema.json             # Review output JSON schema

Note: The pipeline reads review/SKILL.md and plan-eng-review/SKILL.md
from the main branch at runtime. These are the actual gstack skill files,
not copies. When the skills are updated, the pipeline automatically picks
up the changes.

Setup Required

Secrets

ANTHROPIC_API_KEY (required) — Claude API access
HF_TOKEN (recommended) — HuggingFace Inference API

Variables

AUTO_MERGE_ENABLED — "true" to enable auto-merge, default "false"

Optional (for merge/approve authority)

APP_ID (variable) + APP_PRIVATE_KEY (secret) — GitHub App credentials

Triggers

pull_request: [opened, synchronize, ready_for_review]
issue_comment containing @gstack-review (manual re-trigger)

Cost per Review

Step	Model	Cost
Triage	Qwen2.5-3B (HF)	~free
Prompt compile	Claude Sonnet	~$0.01-0.03
Deep review	Claude Sonnet	~$0.05-0.30
Routing	None	free
Total		~$0.06-0.33

Compare: Anthropic Claude Code Review = $15-25/review (Teams/Enterprise only).

…triage and fix skill-docs workflow

schneidermr · 2026-03-21T20:30:15Z

🤖 gstack as a GitHub Action — your skills, running on every PR, automatically

What if /review and /plan-eng-review ran on every PR without anyone typing a slash command?

This PR turns gstack's review skills into a fully automated GitHub Actions pipeline. It reads the actual review/SKILL.md and plan-eng-review/SKILL.md from main, compiles them into a headless review prompt, and uses Claude to score, comment, approve, or reject PRs — with auto-merge for the safe stuff.

The Pipeline

PR opened
    │
    ▼
┌─ Step 1: TRIAGE ─────────────────────────────────────────────┐
│  Qwen 2.5 3B (HuggingFace) • ~2s • ~free                     │
│  Reads: diff, review comments, conversation, linked issues   │
│  → classifies type, risk, review depth                       │
└──────────────────────────────────┬───────────────────────────┘
                                   ▼
┌─ Step 2: PROMPT COMPILATION ─────────────────────────────────┐
│  Claude Sonnet • ~10s • ~$0.02                               │
│  Reads: review/SKILL.md + plan-eng-review/SKILL.md (main)    │
│  → strips interactive patterns, compiles CI review prompt    │
└──────────────────────────────────┬───────────────────────────┘
                                   ▼
┌─ Step 3: DEEP REVIEW ───────────────────────────────────────┐
│  Claude Sonnet • ~30-60s • ~$0.10-0.30                      │
│  → 5-dimension scores, severity-ranked findings, verdict    │
└──────────────────────────────────┬──────────────────────────┘
                                   ▼
┌─ Step 4: ACTION ─────────────────────────────────────────────┐
│  Deterministic bash • ~1s • free                             │
│  → approve / request changes / comment / label / merge       │
└──────────────────────────────────────────────────────────────┘

Total: ~$0.10–0.33 per PR. Under 2 minutes.

Why this matters

gstack's superpower is its opinionated review skills — the paranoid staff engineer persona, the architecture heuristics, the severity classification framework. But right now those only fire when someone manually runs /review in Claude Code.

This pipeline makes them fire on every PR, automatically, without changing a single line of the skill files. Step 2 reads the real SKILL.md files from main and compiles them on the fly — so when gstack's skills get better, the automated reviews get better too. Zero manual sync.

The scoring and routing

Every PR gets a structured JSON review with 5-dimension scores:

Dimension	What it checks
Design	Architecture fit, abstraction quality, readability
Security	OWASP patterns, injection, auth, secrets
Performance	N+1 queries, resource leaks, missing timeouts
Test Coverage	New paths tested, edge cases, regression tests
Completeness	Does the diff match the PR description?

The key differentiator isn't just cost — it's that the review knowledge comes from this repo's own skill files. Not a generic prompt. Not Anthropic's internal review framework. The same opinionated, battle-tested staff engineer persona from /review, compiled for CI.

Claude Code Review is a great product for enterprises that want zero-config depth. This is for people who already have gstack and want it running continuously.

What's in the PR

.github/
├── workflows/
│   └── gstack-pr-review.yml        # The 4-step workflow
└── gstack-review/
    ├── triage.py                    # HuggingFace Qwen classifier
    ├── compile-instructions.md      # Meta-prompt for skill compilation
    ├── review-schema.json           # JSON output contract
    └── route-action.sh              # Deterministic action routing

Five files. The skill files are untouched — read at runtime from main.

Setup: Two secrets (ANTHROPIC_API_KEY, HF_TOKEN), one optional variable (AUTO_MERGE_ENABLED), and optionally a GitHub App for merge authority.

tl;dr

gstack already has the best review skills. This makes them run on every PR without anyone lifting a finger — 50–100x cheaper and 10x faster than the managed alternative. The skills stay in your control, the triage is transparent, and the automation is configurable from "comment only" all the way to "auto-merge".

Every review produces a downloadable JSON artifact — full audit trail of what was scored, what was found, and what action was taken.

feat: automated PR review pipeline using gstack skills + HuggingFace …

7a5ecf5

…triage and fix skill-docs workflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: gstack-inspired AI PR review pipeline with HuggingFace triage, Claude review, and auto-merge#305

feat: gstack-inspired AI PR review pipeline with HuggingFace triage, Claude review, and auto-merge#305
schneidermr wants to merge 1 commit intogarrytan:mainfrom
bitkaio:feat/gstack-pr-review-pipeline

schneidermr commented Mar 21, 2026

Uh oh!

schneidermr commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

schneidermr commented Mar 21, 2026

Summary

Motivation

Architecture

Step 1 — Triage (HuggingFace, ~2-5s, ~free)

Step 2 — Prompt Compilation (Claude Sonnet, ~10-15s, ~$0.01-0.03)

Step 3 — Deep Review (Claude Sonnet, ~30-90s, ~$0.05-0.30)

Step 4 — Action Routing (bash, ~1-2s, free)

Decision Matrix

Key Design Decisions

Files

Setup Required

Secrets

Variables

Optional (for merge/approve authority)

Triggers

Cost per Review

Uh oh!

schneidermr commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 gstack as a GitHub Action — your skills, running on every PR, automatically

The Pipeline

Why this matters

The scoring and routing

What's in the PR

tl;dr

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

schneidermr commented Mar 21, 2026 •

edited

Loading