Skip to content

feat: add secure GitHub webhook ingestion foundation#17

Open
swas3301 wants to merge 1 commit into
AOSSIE-Org:mainfrom
swas3301:feat/github-webhook-ingestion
Open

feat: add secure GitHub webhook ingestion foundation#17
swas3301 wants to merge 1 commit into
AOSSIE-Org:mainfrom
swas3301:feat/github-webhook-ingestion

Conversation

@swas3301
Copy link
Copy Markdown

@swas3301 swas3301 commented May 11, 2026

What changed

  • Added GitHub webhook signature verification using X-Hub-Signature-256.
  • Added webhook delivery dedupe using X-GitHub-Delivery.
  • Added mapping from selected GitHub webhook payloads to existing ContributionEvent records.
  • Added optional webhook config and docs.
  • Added tests for signature verification, event mapping, delivery requirements, and dedupe.

Why

This prepares Gitcord for org-level webhook sync without replacing /sync yet. /sync remains the backfill and
reconcile path.

How tested

  • ./.venv/bin/python -m pytest
  • ./.venv/bin/python -m ruff check src/ghdcbot/engine/webhooks.py src/ghdcbot/config/models.py src/ghdcbot/ core/interfaces.py src/ghdcbot/adapters/storage/sqlite.py tests/test_webhooks.py tests/test_config.py

Addressed Issues:

Related to #12

Screenshots/Recordings:

Not applicable. This is backend webhook ingestion logic with tests and docs.

Additional Notes:

This PR adds the webhook ingestion foundation only. It does not add a public HTTP endpoint and does not replace
/sync. A later PR can wire this into a persistent server.

AI Usage Disclosure:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the [AI Usage Policy](https://github.com/AOSSIE-Org/Info/
    blob/main/AI-UsagePolicy.md) and this PR complies with this policy. I have tested the code locally and I am
    responsible for it.

I have used the following AI models and tools: OpenAI ChatGPT/Codex

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with
    the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without
    review otherwise.

Summary by CodeRabbit

  • New Features

    • Added GitHub webhook ingestion to accept organization-level webhook events (pull requests, issues, comments, reviews).
    • Webhook payloads are verified via HMAC signatures and automatically deduplicated to handle GitHub retries.
  • Documentation

    • Updated README with GitHub webhook configuration guidance and setup instructions.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

Walkthrough

This PR adds complete GitHub webhook support to Gitcord. It introduces webhook configuration models with validation, a new storage table for deduplication, a full webhook processing engine with HMAC-SHA256 signature verification and event mapping, and comprehensive tests. Webhook events are mapped to ContributionEvent records and deduplicated by delivery ID before storage.

Changes

GitHub Webhook Ingestion

Layer / File(s) Summary
Configuration Models & Validation
src/ghdcbot/config/models.py
New WebhookConfig model with enabled and secret fields. Validators enforce minimum 16-char secret length and require secret when enabled=true. BotConfig gains optional webhooks field.
Storage Interface Contract
src/ghdcbot/core/interfaces.py
Storage protocol updated with record_webhook_delivery(delivery_id, event_name) -> bool method to signal new vs. duplicate deliveries.
Storage Implementation
src/ghdcbot/adapters/storage/sqlite.py
Add webhook_deliveries table with delivery_id primary key. Implement record_webhook_delivery to insert delivery records and return False on duplicate delivery_id (IntegrityError).
Webhook Processing Engine
src/ghdcbot/engine/webhooks.py
Complete webhook handler with HMAC-SHA256 verification, header extraction (event name, delivery id), and mapping of pull_request, pull_request_review, issues, and issue_comment events to ContributionEvent objects. ingest_github_delivery deduplicates via storage before persisting. Helper functions normalize repo names, extract labels, parse timestamps to UTC, and retrieve headers case-insensitively.
Configuration Examples & Documentation
.env.example, config/example.yaml, README.md
Add GITHUB_WEBHOOK_SECRET environment variable example, optional webhooks config section with enabled and secret fields, and README section explaining webhook ingestion flow, signature verification, deduplication, and operational guidance.
Tests & Validation
tests/test_config.py, tests/test_webhooks.py
Validate webhook config rules (secret required when enabled). Test signature verification success/failure. Test event mapping for PR/issue/comment payloads. Verify header requirements. Test end-to-end deduplication with SqliteStorage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

Python Lang, Documentation

🐰 A webhook bounced from GitHub's shore,
With signatures checked, plus one more!
Events now map to contributions so bright,
Deduplicated delivery done just right.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and accurately summarizes the main change: adding secure GitHub webhook ingestion as the foundation for a new feature.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/ghdcbot/adapters/storage/sqlite.py`:
- Around line 325-339: In record_webhook_delivery, replace the use of
timezone.utc with the Python 3.11+ alias datetime.UTC when constructing the
timestamp (now = datetime.now(datetime.UTC).isoformat()) so the code uses the
modern alias; update the import usage if needed so datetime.UTC is referenced
from the datetime module and keep the rest of the function (connection, INSERT,
IntegrityError handling) unchanged.

In `@src/ghdcbot/config/models.py`:
- Around line 230-234: The validate_enabled_secret model validator currently
annotates its return type as a quoted string "WebhookConfig"; since from
__future__ import annotations is present, change the annotation to an unquoted
WebhookConfig on the validate_enabled_secret method so the signature uses
WebhookConfig (no quotes), keeping the same behavior for the model_validator
decorator and returning self.

In `@src/ghdcbot/engine/webhooks.py`:
- Around line 79-87: The code currently calls
record_delivery(delivery.delivery_id, delivery.event_name) before persisting
events (storage.record_contributions), which can mark retries as duplicates if
persistence fails; change the flow so dedupe marking and contributions
persistence are done atomically: either perform storage.record_contributions
first and only call record_delivery on successful write, or (preferably) use a
single transactional operation in your data layer that inserts contributions and
sets the delivery as processed together; update the logic around record_delivery
and storage.record_contributions (referenced symbols: record_delivery,
storage.record_contributions, DeliveryIngestResult, delivery.delivery_id,
delivery.event_name, delivery.events) to ensure failures roll back and do not
leave the delivery marked processed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 175b7768-0075-4dcc-a31c-88a95af800d5

📥 Commits

Reviewing files that changed from the base of the PR and between 5ee89c7 and 6c42b34.

📒 Files selected for processing (9)
  • .env.example
  • README.md
  • config/example.yaml
  • src/ghdcbot/adapters/storage/sqlite.py
  • src/ghdcbot/config/models.py
  • src/ghdcbot/core/interfaces.py
  • src/ghdcbot/engine/webhooks.py
  • tests/test_config.py
  • tests/test_webhooks.py

Comment on lines +325 to +339
def record_webhook_delivery(self, delivery_id: str, event_name: str) -> bool:
# GitHub can retry the same delivery.
now = datetime.now(timezone.utc).isoformat()
with self._connect() as conn:
try:
conn.execute(
"""
INSERT INTO webhook_deliveries (delivery_id, event_name, received_at)
VALUES (?, ?, ?)
""",
(delivery_id, event_name, now),
)
except sqlite3.IntegrityError:
return False
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider using datetime.UTC alias.

Python 3.11+ (which this project requires) provides datetime.UTC as a cleaner alias for timezone.utc.

♻️ Optional modernization
     def record_webhook_delivery(self, delivery_id: str, event_name: str) -> bool:
         # GitHub can retry the same delivery.
-        now = datetime.now(timezone.utc).isoformat()
+        now = datetime.now(datetime.UTC).isoformat()
         with self._connect() as conn:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def record_webhook_delivery(self, delivery_id: str, event_name: str) -> bool:
# GitHub can retry the same delivery.
now = datetime.now(timezone.utc).isoformat()
with self._connect() as conn:
try:
conn.execute(
"""
INSERT INTO webhook_deliveries (delivery_id, event_name, received_at)
VALUES (?, ?, ?)
""",
(delivery_id, event_name, now),
)
except sqlite3.IntegrityError:
return False
return True
def record_webhook_delivery(self, delivery_id: str, event_name: str) -> bool:
# GitHub can retry the same delivery.
now = datetime.now(datetime.UTC).isoformat()
with self._connect() as conn:
try:
conn.execute(
"""
INSERT INTO webhook_deliveries (delivery_id, event_name, received_at)
VALUES (?, ?, ?)
""",
(delivery_id, event_name, now),
)
except sqlite3.IntegrityError:
return False
return True
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 327-327: Use datetime.UTC alias

Convert to datetime.UTC alias

(UP017)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ghdcbot/adapters/storage/sqlite.py` around lines 325 - 339, In
record_webhook_delivery, replace the use of timezone.utc with the Python 3.11+
alias datetime.UTC when constructing the timestamp (now =
datetime.now(datetime.UTC).isoformat()) so the code uses the modern alias;
update the import usage if needed so datetime.UTC is referenced from the
datetime module and keep the rest of the function (connection, INSERT,
IntegrityError handling) unchanged.

Comment on lines +230 to +234
@model_validator(mode="after")
def validate_enabled_secret(self) -> "WebhookConfig":
if self.enabled and not self.secret:
raise ValueError("webhooks.secret is required when webhooks.enabled is true")
return self
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider unquoting the return type annotation.

Since from __future__ import annotations is already imported (line 1), the return type can be unquoted for consistency.

♻️ Optional style improvement
     `@model_validator`(mode="after")
-    def validate_enabled_secret(self) -> "WebhookConfig":
+    def validate_enabled_secret(self) -> WebhookConfig:
         if self.enabled and not self.secret:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@model_validator(mode="after")
def validate_enabled_secret(self) -> "WebhookConfig":
if self.enabled and not self.secret:
raise ValueError("webhooks.secret is required when webhooks.enabled is true")
return self
`@model_validator`(mode="after")
def validate_enabled_secret(self) -> WebhookConfig:
if self.enabled and not self.secret:
raise ValueError("webhooks.secret is required when webhooks.enabled is true")
return self
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 231-231: Remove quotes from type annotation

Remove quotes

(UP037)


[warning] 233-233: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ghdcbot/config/models.py` around lines 230 - 234, The
validate_enabled_secret model validator currently annotates its return type as a
quoted string "WebhookConfig"; since from __future__ import annotations is
present, change the annotation to an unquoted WebhookConfig on the
validate_enabled_secret method so the signature uses WebhookConfig (no quotes),
keeping the same behavior for the model_validator decorator and returning self.

Comment on lines +79 to +87
if not record_delivery(delivery.delivery_id, delivery.event_name):
return DeliveryIngestResult(
delivery_id=delivery.delivery_id,
event_name=delivery.event_name,
stored=0,
duplicate=True,
)

stored = storage.record_contributions(delivery.events) if delivery.events else 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Make dedupe marking and contribution persistence atomic

Line 79 records the delivery as processed before Line 87 writes events. If record_contributions(...) fails (e.g., transient DB error), retries will be flagged duplicate and the events are lost permanently.

Suggested direction
 def ingest_github_delivery(storage: Any, delivery: GitHubDelivery) -> DeliveryIngestResult:
-    record_delivery = getattr(storage, "record_webhook_delivery", None)
-    if not callable(record_delivery):
-        raise WebhookError("Storage adapter does not support webhook delivery dedupe")
-
-    if not record_delivery(delivery.delivery_id, delivery.event_name):
-        return DeliveryIngestResult(
-            delivery_id=delivery.delivery_id,
-            event_name=delivery.event_name,
-            stored=0,
-            duplicate=True,
-        )
-
-    stored = storage.record_contributions(delivery.events) if delivery.events else 0
-    return DeliveryIngestResult(
-        delivery_id=delivery.delivery_id,
-        event_name=delivery.event_name,
-        stored=stored,
-    )
+    # Prefer a single storage call that performs:
+    # 1) insert delivery-id if new
+    # 2) persist contributions
+    # in one DB transaction.
+    ingest = getattr(storage, "ingest_webhook_delivery", None)
+    if not callable(ingest):
+        raise WebhookError("Storage adapter does not support atomic webhook ingest")
+    stored, duplicate = ingest(
+        delivery_id=delivery.delivery_id,
+        event_name=delivery.event_name,
+        events=delivery.events,
+    )
+    return DeliveryIngestResult(
+        delivery_id=delivery.delivery_id,
+        event_name=delivery.event_name,
+        stored=stored,
+        duplicate=duplicate,
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ghdcbot/engine/webhooks.py` around lines 79 - 87, The code currently
calls record_delivery(delivery.delivery_id, delivery.event_name) before
persisting events (storage.record_contributions), which can mark retries as
duplicates if persistence fails; change the flow so dedupe marking and
contributions persistence are done atomically: either perform
storage.record_contributions first and only call record_delivery on successful
write, or (preferably) use a single transactional operation in your data layer
that inserts contributions and sets the delivery as processed together; update
the logic around record_delivery and storage.record_contributions (referenced
symbols: record_delivery, storage.record_contributions, DeliveryIngestResult,
delivery.delivery_id, delivery.event_name, delivery.events) to ensure failures
roll back and do not leave the delivery marked processed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant