Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ A sample family of reusable [GitHub Agentic Workflows](https://github.github.com
### Depth Triage & Analysis Workflows

- [🏷️ Issue Triage](docs/issue-triage.md) - Triage issues and pull requests
- [🔍 Issue Duplication Detector](docs/issue-duplication-detector.md) - Detect duplicate issues and suggest next steps
- [🏥 CI Doctor](docs/ci-doctor.md) - Monitor CI workflows and investigate failures automatically
- [🔍 Repo Ask](docs/repo-ask.md) - Intelligent research assistant for repository questions and analysis
- [🔍 Daily Accessibility Review](docs/daily-accessibility-review.md) - Review application accessibility by automatically running and using the application
Expand Down
100 changes: 100 additions & 0 deletions docs/issue-duplication-detector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# 🔍 Issue Duplication Detector

> For an overview of all available workflows, see the [main README](../README.md).

The [issue duplication detector workflow](../workflows/issue-duplication-detector.md?plain=1) runs every 6 hours to detect duplicate issues in the repository and suggest next steps.

## Installation

```bash
# Install the 'gh aw' extension
gh extension install github/gh-aw

# Add the Issue Duplication Detector workflow to your repository
gh aw add githubnext/agentics/issue-duplication-detector
```

This walks you through adding the workflow to your repository.

You must also [choose a coding agent](https://github.github.com/gh-aw/reference/engines/) and add an API key secret for the agent to your repository.

You can manually trigger this workflow using `gh aw run issue-duplication-detector` or wait for it to run automatically on its 6-hour schedule.

**Mandatory Checklist**

* [ ] If in a fork, enable GitHub Actions and Issues in the fork settings

## Configuration

This workflow requires no configuration and works out of the box. The workflow uses intelligent semantic analysis to detect duplicate issues by comparing titles, descriptions, and content.

### How It Works

The workflow operates on a 6-hour batch schedule:

1. **Searches for recent issues**: Queries for issues created or updated in the last 6 hours
2. **Analyzes each issue**: Extracts key information from the issue title and body
3. **Searches for duplicates**: Uses GitHub search with keywords to find similar existing issues
4. **Compares semantically**: Analyzes whether issues describe the same underlying problem or request
5. **Posts helpful comments**: If duplicates are found, adds a polite comment with:
- Links to potential duplicate issues
- Explanation of why they appear to be duplicates
- Suggested next steps for the issue author

### Batch Processing & Cost Control

- Runs every 6 hours to batch-process multiple issues in a single workflow run
- Only comments when high-confidence duplicates are found
- Maximum 10 comments per run to prevent excessive API usage
- 15-minute timeout ensures predictable runtime costs

After editing run `gh aw compile` to update the workflow and commit all changes to the default branch.

## What it reads from GitHub

- Recently created or updated issues (last 6 hours)
- Full issue details including title, body, and metadata
- Repository issue history for duplicate detection
- Both open and closed issues for comprehensive analysis

## What it creates

- Adds comments to issues that appear to be duplicates
- Comments include links to potential duplicates and suggested next steps
- Requires `issues: write` permission

## What web searches it performs

- Does not perform web searches; operates entirely within GitHub data

## Human in the loop

- Review duplicate detection comments for accuracy
- Verify that flagged issues are truly duplicates
- Close duplicate issues or provide clarification if the detection was incorrect
- Add any missing context to the original issue if the duplicate has valuable additional information
- Monitor false positives and disable the workflow if accuracy is not acceptable

## Activity duration

- By default this workflow will trigger for at most 30 days, after which it will stop triggering.
- This allows you to experiment with the workflow for a limited time before deciding whether to keep it active.

## Example Output

When a duplicate is detected, the workflow posts a comment like:

```markdown
👋 Hi! It looks like this issue might be a duplicate of existing issue(s):

- #123 - Add support for custom templates

Both issues describe the need for customizable templates in the project configuration.

**Suggested next steps:**
- Review issue #123 to see if it addresses your concern
- If this issue has additional context not covered in #123, consider adding it there
- If they are indeed the same, this issue can be closed as a duplicate

Let us know if you think this assessment is incorrect!
```
103 changes: 103 additions & 0 deletions workflows/issue-duplication-detector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
description: Detect duplicate issues and suggest next steps (batched every 6 hours)
on:
schedule:
- cron: "0 */6 * * *" # Every 6 hours
workflow_dispatch:

permissions: read-all

tools:
github:
toolsets: [default]
bash:
- "*"

safe-outputs:
add-comment:
max: 10 # Allow multiple comments in batch mode

timeout-minutes: 15
---

# Issue Duplication Detector

You are an AI agent that detects duplicate issues in the repository `${{ github.repository }}`.

## Your Task

Analyze recently created or updated issues to determine if they are duplicates of existing issues. This workflow runs every 6 hours to batch-process issues, providing cost control and natural request batching.

## Instructions

1. **Find recent issues to check**:
- Use GitHub tools to search for issues in this repository that were created or updated in the last 6 hours
- Construct a query like: `repo:${{ github.repository }} is:issue updated:>=<timestamp-6-hours-ago>`
- Where the timestamp should be in ISO 8601 format (e.g., 2024-02-04T17:00:00Z)
- This captures any issues that might have been created or edited since the last run
- If no recent issues are found, exit successfully without further action

2. **For each recent issue found**:
- Fetch the full issue details using GitHub tools
- Note the issue number, title, and body content

3. **Search for duplicate issues**:
- For each recent issue, use GitHub tools to search for similar existing issues
- Search using keywords from the issue's title and body
- Look for issues that describe the same problem, feature request, or topic
- Consider both open and closed issues (closed issues might have been resolved)
- Focus on semantic similarity, not just exact keyword matches
- Exclude the current issue itself from the duplicate search

4. **Analyze and compare**:
- Review the content of potentially duplicate issues
- Determine if they are truly duplicates or just similar topics
- A duplicate means the same underlying problem, request, or discussion
- Consider that different wording might describe the same issue

5. **For issues with duplicates found**:
- Use the `output.add-comment` safe output to post a comment on the issue
- In your comment:
- Politely inform that this appears to be a duplicate
- List the duplicate issue(s) with their numbers and titles using markdown links (e.g., "This appears to be a duplicate of #123")
- Provide a brief explanation of why they are duplicates
- Suggest next steps, such as:
- Reviewing the existing issue(s) to see if they already address the concern
- Adding any new information to the existing issue if this one has additional context
- Closing this issue as a duplicate if appropriate
- Keep the tone helpful and constructive

6. **For issues with no duplicates**:
- Do not add any comment
- The issue is unique and can proceed normally

## Important Guidelines

- **Batch processing**: Process multiple issues in a single run when available
- **Read-only analysis**: You are only analyzing and commenting, not modifying issues
- **Be thorough**: Search comprehensively to avoid false negatives
- **Be accurate**: Only flag clear duplicates to avoid false positives
- **Be helpful**: Provide clear reasoning and actionable suggestions
- **Use safe-outputs**: Always use `output.add-comment` for commenting, never try to use GitHub write APIs directly
- **Cost control**: The 6-hour batching window provides a natural upper bound on costs

## Example Comment Format

When you find duplicates, structure your comment like this:

```markdown
👋 Hi! It looks like this issue might be a duplicate of existing issue(s):

- #123 - [Title of duplicate issue]

Both issues describe [brief explanation of the common problem/request].

**Suggested next steps:**
- Review issue #123 to see if it addresses your concern
- If this issue has additional context not covered in #123, consider adding it there
- If they are indeed the same, this issue can be closed as a duplicate

Let us know if you think this assessment is incorrect!
```

Remember: Only comment if you have high confidence that duplicates exist.