feat: add secure GitHub webhook ingestion foundation#17
Conversation
WalkthroughThis PR adds complete GitHub webhook support to Gitcord. It introduces webhook configuration models with validation, a new storage table for deduplication, a full webhook processing engine with HMAC-SHA256 signature verification and event mapping, and comprehensive tests. Webhook events are mapped to ContributionEvent records and deduplicated by delivery ID before storage. ChangesGitHub Webhook Ingestion
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/ghdcbot/adapters/storage/sqlite.py`:
- Around line 325-339: In record_webhook_delivery, replace the use of
timezone.utc with the Python 3.11+ alias datetime.UTC when constructing the
timestamp (now = datetime.now(datetime.UTC).isoformat()) so the code uses the
modern alias; update the import usage if needed so datetime.UTC is referenced
from the datetime module and keep the rest of the function (connection, INSERT,
IntegrityError handling) unchanged.
In `@src/ghdcbot/config/models.py`:
- Around line 230-234: The validate_enabled_secret model validator currently
annotates its return type as a quoted string "WebhookConfig"; since from
__future__ import annotations is present, change the annotation to an unquoted
WebhookConfig on the validate_enabled_secret method so the signature uses
WebhookConfig (no quotes), keeping the same behavior for the model_validator
decorator and returning self.
In `@src/ghdcbot/engine/webhooks.py`:
- Around line 79-87: The code currently calls
record_delivery(delivery.delivery_id, delivery.event_name) before persisting
events (storage.record_contributions), which can mark retries as duplicates if
persistence fails; change the flow so dedupe marking and contributions
persistence are done atomically: either perform storage.record_contributions
first and only call record_delivery on successful write, or (preferably) use a
single transactional operation in your data layer that inserts contributions and
sets the delivery as processed together; update the logic around record_delivery
and storage.record_contributions (referenced symbols: record_delivery,
storage.record_contributions, DeliveryIngestResult, delivery.delivery_id,
delivery.event_name, delivery.events) to ensure failures roll back and do not
leave the delivery marked processed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 175b7768-0075-4dcc-a31c-88a95af800d5
📒 Files selected for processing (9)
.env.exampleREADME.mdconfig/example.yamlsrc/ghdcbot/adapters/storage/sqlite.pysrc/ghdcbot/config/models.pysrc/ghdcbot/core/interfaces.pysrc/ghdcbot/engine/webhooks.pytests/test_config.pytests/test_webhooks.py
| def record_webhook_delivery(self, delivery_id: str, event_name: str) -> bool: | ||
| # GitHub can retry the same delivery. | ||
| now = datetime.now(timezone.utc).isoformat() | ||
| with self._connect() as conn: | ||
| try: | ||
| conn.execute( | ||
| """ | ||
| INSERT INTO webhook_deliveries (delivery_id, event_name, received_at) | ||
| VALUES (?, ?, ?) | ||
| """, | ||
| (delivery_id, event_name, now), | ||
| ) | ||
| except sqlite3.IntegrityError: | ||
| return False | ||
| return True |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider using datetime.UTC alias.
Python 3.11+ (which this project requires) provides datetime.UTC as a cleaner alias for timezone.utc.
♻️ Optional modernization
def record_webhook_delivery(self, delivery_id: str, event_name: str) -> bool:
# GitHub can retry the same delivery.
- now = datetime.now(timezone.utc).isoformat()
+ now = datetime.now(datetime.UTC).isoformat()
with self._connect() as conn:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def record_webhook_delivery(self, delivery_id: str, event_name: str) -> bool: | |
| # GitHub can retry the same delivery. | |
| now = datetime.now(timezone.utc).isoformat() | |
| with self._connect() as conn: | |
| try: | |
| conn.execute( | |
| """ | |
| INSERT INTO webhook_deliveries (delivery_id, event_name, received_at) | |
| VALUES (?, ?, ?) | |
| """, | |
| (delivery_id, event_name, now), | |
| ) | |
| except sqlite3.IntegrityError: | |
| return False | |
| return True | |
| def record_webhook_delivery(self, delivery_id: str, event_name: str) -> bool: | |
| # GitHub can retry the same delivery. | |
| now = datetime.now(datetime.UTC).isoformat() | |
| with self._connect() as conn: | |
| try: | |
| conn.execute( | |
| """ | |
| INSERT INTO webhook_deliveries (delivery_id, event_name, received_at) | |
| VALUES (?, ?, ?) | |
| """, | |
| (delivery_id, event_name, now), | |
| ) | |
| except sqlite3.IntegrityError: | |
| return False | |
| return True |
🧰 Tools
🪛 Ruff (0.15.12)
[warning] 327-327: Use datetime.UTC alias
Convert to datetime.UTC alias
(UP017)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/ghdcbot/adapters/storage/sqlite.py` around lines 325 - 339, In
record_webhook_delivery, replace the use of timezone.utc with the Python 3.11+
alias datetime.UTC when constructing the timestamp (now =
datetime.now(datetime.UTC).isoformat()) so the code uses the modern alias;
update the import usage if needed so datetime.UTC is referenced from the
datetime module and keep the rest of the function (connection, INSERT,
IntegrityError handling) unchanged.
| @model_validator(mode="after") | ||
| def validate_enabled_secret(self) -> "WebhookConfig": | ||
| if self.enabled and not self.secret: | ||
| raise ValueError("webhooks.secret is required when webhooks.enabled is true") | ||
| return self |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider unquoting the return type annotation.
Since from __future__ import annotations is already imported (line 1), the return type can be unquoted for consistency.
♻️ Optional style improvement
`@model_validator`(mode="after")
- def validate_enabled_secret(self) -> "WebhookConfig":
+ def validate_enabled_secret(self) -> WebhookConfig:
if self.enabled and not self.secret:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @model_validator(mode="after") | |
| def validate_enabled_secret(self) -> "WebhookConfig": | |
| if self.enabled and not self.secret: | |
| raise ValueError("webhooks.secret is required when webhooks.enabled is true") | |
| return self | |
| `@model_validator`(mode="after") | |
| def validate_enabled_secret(self) -> WebhookConfig: | |
| if self.enabled and not self.secret: | |
| raise ValueError("webhooks.secret is required when webhooks.enabled is true") | |
| return self |
🧰 Tools
🪛 Ruff (0.15.12)
[warning] 231-231: Remove quotes from type annotation
Remove quotes
(UP037)
[warning] 233-233: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/ghdcbot/config/models.py` around lines 230 - 234, The
validate_enabled_secret model validator currently annotates its return type as a
quoted string "WebhookConfig"; since from __future__ import annotations is
present, change the annotation to an unquoted WebhookConfig on the
validate_enabled_secret method so the signature uses WebhookConfig (no quotes),
keeping the same behavior for the model_validator decorator and returning self.
| if not record_delivery(delivery.delivery_id, delivery.event_name): | ||
| return DeliveryIngestResult( | ||
| delivery_id=delivery.delivery_id, | ||
| event_name=delivery.event_name, | ||
| stored=0, | ||
| duplicate=True, | ||
| ) | ||
|
|
||
| stored = storage.record_contributions(delivery.events) if delivery.events else 0 |
There was a problem hiding this comment.
Make dedupe marking and contribution persistence atomic
Line 79 records the delivery as processed before Line 87 writes events. If record_contributions(...) fails (e.g., transient DB error), retries will be flagged duplicate and the events are lost permanently.
Suggested direction
def ingest_github_delivery(storage: Any, delivery: GitHubDelivery) -> DeliveryIngestResult:
- record_delivery = getattr(storage, "record_webhook_delivery", None)
- if not callable(record_delivery):
- raise WebhookError("Storage adapter does not support webhook delivery dedupe")
-
- if not record_delivery(delivery.delivery_id, delivery.event_name):
- return DeliveryIngestResult(
- delivery_id=delivery.delivery_id,
- event_name=delivery.event_name,
- stored=0,
- duplicate=True,
- )
-
- stored = storage.record_contributions(delivery.events) if delivery.events else 0
- return DeliveryIngestResult(
- delivery_id=delivery.delivery_id,
- event_name=delivery.event_name,
- stored=stored,
- )
+ # Prefer a single storage call that performs:
+ # 1) insert delivery-id if new
+ # 2) persist contributions
+ # in one DB transaction.
+ ingest = getattr(storage, "ingest_webhook_delivery", None)
+ if not callable(ingest):
+ raise WebhookError("Storage adapter does not support atomic webhook ingest")
+ stored, duplicate = ingest(
+ delivery_id=delivery.delivery_id,
+ event_name=delivery.event_name,
+ events=delivery.events,
+ )
+ return DeliveryIngestResult(
+ delivery_id=delivery.delivery_id,
+ event_name=delivery.event_name,
+ stored=stored,
+ duplicate=duplicate,
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/ghdcbot/engine/webhooks.py` around lines 79 - 87, The code currently
calls record_delivery(delivery.delivery_id, delivery.event_name) before
persisting events (storage.record_contributions), which can mark retries as
duplicates if persistence fails; change the flow so dedupe marking and
contributions persistence are done atomically: either perform
storage.record_contributions first and only call record_delivery on successful
write, or (preferably) use a single transactional operation in your data layer
that inserts contributions and sets the delivery as processed together; update
the logic around record_delivery and storage.record_contributions (referenced
symbols: record_delivery, storage.record_contributions, DeliveryIngestResult,
delivery.delivery_id, delivery.event_name, delivery.events) to ensure failures
roll back and do not leave the delivery marked processed.
What changed
X-Hub-Signature-256.X-GitHub-Delivery.ContributionEventrecords.Why
This prepares Gitcord for org-level webhook sync without replacing
/syncyet./syncremains the backfill andreconcile path.
How tested
./.venv/bin/python -m pytest./.venv/bin/python -m ruff check src/ghdcbot/engine/webhooks.py src/ghdcbot/config/models.py src/ghdcbot/ core/interfaces.py src/ghdcbot/adapters/storage/sqlite.py tests/test_webhooks.py tests/test_config.pyAddressed Issues:
Related to #12
Screenshots/Recordings:
Not applicable. This is backend webhook ingestion logic with tests and docs.
Additional Notes:
This PR adds the webhook ingestion foundation only. It does not add a public HTTP endpoint and does not replace
/sync. A later PR can wire this into a persistent server.AI Usage Disclosure:
blob/main/AI-UsagePolicy.md) and this PR complies with this policy. I have tested the code locally and I am
responsible for it.
I have used the following AI models and tools: OpenAI ChatGPT/Codex
Checklist
the project maintainers there
review otherwise.
Summary by CodeRabbit
New Features
Documentation