Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

Add GitHub worker for processing webhooks and fetching markdown files

Summary

This PR introduces a new github-worker package that processes GitHub webhooks to automatically fetch and store .md and .mdx files from repositories. The worker implements:

  • Webhook Processing: Handles GitHub push and pull_request events with HMAC-SHA256 signature verification
  • GitHub API Integration: Recursively fetches all markdown files from repositories with retry logic and rate limiting handling
  • Versioned Database Storage: Uses Cloudflare Durable Objects to store files with version tracking via SQL database
  • Modular Architecture: Clean separation between webhook handling, GitHub API client, and database operations

The implementation follows existing monorepo patterns and integrates with the Cloudflare Workers ecosystem using wrangler configuration.

Review & Testing Checklist for Human

⚠️ CRITICAL - This PR requires extensive testing as the core functionality could not be fully verified locally

  • End-to-end webhook testing: Set up a test repository with GitHub webhooks pointing to the deployed worker and verify files are correctly fetched and stored
  • Security verification: Test webhook signature verification with real GitHub webhook payloads to ensure HMAC-SHA256 implementation is correct
  • Database operations: Verify the SQL schema creation, versioned upsert functionality, and data retrieval work correctly in the Durable Object environment
  • GitHub API integration: Test with repositories containing various .md/.mdx file structures, including nested directories and edge cases
  • Error handling: Verify graceful handling of GitHub API rate limits, authentication failures, and malformed webhook payloads

Recommended test plan: Deploy to a staging environment, configure webhooks on a test repository with various markdown files, trigger events, and verify data storage and retrieval.


Diagram

%%{ init : { "theme" : "default" }}%%
graph TB
    GitHub["GitHub Repository<br/>(webhook source)"] --> Webhook["github-worker/src/webhook.ts<br/>Webhook Handler"]:::major-edit
    Webhook --> API["github-worker/src/github-api.ts<br/>GitHub API Client"]:::major-edit
    API --> Database["github-worker/src/database.ts<br/>Durable Object DB"]:::major-edit
    
    Index["github-worker/src/index.ts<br/>Main Worker"]:::major-edit --> Webhook
    Types["github-worker/src/types.ts<br/>Type Definitions"]:::major-edit --> Database
    Types --> API
    Types --> Webhook
    
    Config["github-worker/wrangler.toml<br/>Worker Configuration"]:::major-edit --> Index
    PackageRoot["package.json<br/>(root workspace)"]:::minor-edit --> PackageWorker["github-worker/package.json"]:::major-edit
    
    subgraph Legend
        L1[Major Edit]:::major-edit
        L2[Minor Edit]:::minor-edit
        L3[Context/No Edit]:::context
    end

    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF
Loading

Notes

  • TypeScript types: Had to use any types in several places due to Cloudflare Workers type system complexity - this should be revisited for better type safety
  • Local testing limitations: Could not fully test the worker locally due to missing wrangler installation and Cloudflare runtime dependencies
  • Security consideration: The webhook signature verification is critical for security - ensure the GITHUB_WEBHOOK_SECRET environment variable matches your GitHub webhook configuration
  • Database schema: The versioned upsert approach maintains file history - consider storage implications for repositories with frequent changes

Link to Devin run: https://app.devin.ai/sessions/154eef22d18b4ecd9a571780c03094d8
Requested by: @nathanclevenger

- Created new github-worker package with webhook processing capabilities
- Implements GitHub API client with retry logic for fetching .md/.mdx files
- Uses durable object database with versioned upsert functionality
- Verifies webhook signatures using HMAC-SHA256
- Handles push and pull_request events from GitHub
- Follows existing monorepo patterns and conventions
- Added comprehensive error handling and logging
- Updated root package.json to include github-worker in workspaces

Co-Authored-By: Nathan <[email protected]>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants