Skip to content

RFC: v0.7 storage engine — segment storage + cursor allocator #295

@hardbyte

Description

@hardbyte

Status

Design RFC, not implementation. This issue exists to collect input on the shape of the next storage engine before any code is written. Comments and pushback welcome.

Motivation

The 0.6 release blocker work in #169 confirmed (long-horizon benchmark, 2026-05-17) that even with ADR-012/023/025/026 in place, Awa's queue_storage degrades under a pinned MVCC horizon — 799 → 387 jobs/s at 800 offered/s in a 2h pinned phase. That is the same class of behaviour every per-row-state-machine Postgres queue has (River, Oban, pg-boss, Graphile, pgmq); pgque holds at 799/s by trading feature surface (no per-row retries, heartbeats, cancellation) for immunity.

The 0.6 plan is to land the live terminal counter (#290) and an open-ended HOT-update audit pass to take the visible hot-table scans out of the operator path. That treats the symptoms. The architectural fix lives one level deeper: the lifecycle table itself is the wrong shape for sustained, long-reader-exposed workloads.

The spike investigation captured this in docs/issue-169-storage-spike.md. The high-level direction:

  • Append-only rotation segments for job lifecycle, not a mutable lifecycle table.
  • Tiny per-segment claim ledger (the only mutable hot rows in the design).
  • Cursor allocator moving readers forward across segments rather than scanning prunable history.

The spike concluded the storage shape is right but the claim allocator is the load-bearing piece — the design has to start there, not at the segment format.

Scope of this RFC

What we want input on:

  1. Claim allocator design. What is the simplest allocator that gives SKIP LOCKED-equivalent fairness, supports per-row retries / heartbeats / cancellation, and does NOT degrade under a pinned MVCC horizon? Is it a single advisory-locked cursor table? A per-shard cursor with hash-routed jobs? Something else?
  2. Receipt plane integration. ADR-023's ring partitioning + ADR-026's narrow terminal history are good ideas that should survive the redesign. How do they compose with rotation segments?
  3. Migration story. Awa just did a 0.5→0.6 canonical→queue_storage migration. A 0.6→0.7 segment-storage migration is a real cost. What does the cutover shape look like — staged like 0.5.x→0.6, or a hard cutoff?
  4. Comparable designs. What can we learn from FoundationDB's record layer, ScyllaDB's commit log, or any append-only segment design in the queue space? The pgque design is close in spirit to what we want; what stops it from supporting per-row retries / heartbeats?
  5. TLA+ coverage. Awa's existing TLA+ models check queue_storage's lane identity, receipt claims, and terminal rollups. What needs to be re-modelled before this design ships?

What is out of scope for the RFC:

  • Code. No PRs against this issue until the design is settled.
  • Benchmarks of unimplemented designs. The acceptance bar will be a fresh long_horizon run against an actual implementation.
  • Storage migration tooling specifics — that comes after the design.

Constraints

  • Postgres-only (ADR-001). No extensions Awa doesn't already require.
  • PG14+ floor, PG18 supported.
  • Public API compatibility with v0.6 where feasible — the storage swap should be operator-visible (migration step) but not handler-visible.
  • All ADR-029 transactional follow-up semantics must compose with the new design.

How to contribute

Comments on this issue; pull requests against docs/adr/ for a draft ADR-030 once a direction has consensus. Reference the spike doc and the long-horizon benchmark evidence at #169 when arguing for or against a direction.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    correctnessFormal verification and protocol correctnessfeatureNew functionality

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions