Skip to content

feat: resilient background job retry & monitoring (#130)#1174

Open
mkcash wants to merge 1 commit into
rohitdash08:mainfrom
mkcash:feat/job-retry-monitoring
Open

feat: resilient background job retry & monitoring (#130)#1174
mkcash wants to merge 1 commit into
rohitdash08:mainfrom
mkcash:feat/job-retry-monitoring

Conversation

@mkcash
Copy link
Copy Markdown

@mkcash mkcash commented Jun 3, 2026

Summary

Implements a production-ready async job retry & monitoring system as described in issue #130 ($250 bounty).

Features

  • AsyncJob model: tracks status, attempts, retry timing, duration
  • Exponential backoff: 30s base, doubles per attempt, ±20% jitter
  • Max attempt limit: configurable per job (default 3)
  • JobRunner: generic runner wrapping any callable
  • Retry queue daemon: process_retry_queue() picks up due retries
  • Admin API endpoints:
    • GET /api/admin/jobs — paginated, filterable list
    • GET /api/admin/jobs/:id — detail
    • POST /api/admin/jobs/:id/cancel — cancel pending/retrying jobs
    • POST /api/admin/jobs/retry — trigger retry queue
    • GET /api/admin/jobs/stats — totals, by status, avg duration
  • 15 unit tests: covers all model states, runner, retry queue

Files

  • packages/backend/app/services/jobs.py — core job system
  • packages/backend/app/routes/jobs.py — monitoring API
  • packages/backend/tests/test_jobs.py — comprehensive tests

- AsyncJob model with status tracking
- Exponential backoff retry with jitter
- Max attempt limit configurable per job
- JobRunner: generic synchronous runner
- process_retry_queue: picks up due retries automatically
- Admin API: list/filter jobs, cancel, trigger retry, stats
- 15 unit tests
@mkcash mkcash requested a review from rohitdash08 as a code owner June 3, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant