durability: Flush batches by kim · Pull Request #4478 · clockworklabs/SpacetimeDB

kim · 2026-02-26T10:21:48Z

Instead of periodic flush + sync, the simplified commitlog buffering makes it easy to just pop a batch of transaction from the queue, commit them, and then flush + sync after each batch.

This may lead to an overall higher number of fsyncs on the system, but also adapts to the tx throughput of the individual database (up to the batch size).

After taking some measurements, we may want to make the batch size configurable at runtime.

Depends-on: #4404

Instead of periodic flush + sync, the simplified commitlog buffering makes it easy to just pop a batch of transaction from the queue, commit them, and then flush + sync after each batch. This may lead to an overall higher number of fsyncs on the system, but also adapts to the tx throughput of the individual database (up to the batch size). After taking some measurements, we may want to make the batch size configurable at runtime. Depends-on: #4404

cloutiertyler · 2026-02-26T15:03:45Z

I think we need both an interval and a batch, no? If a database has very low throughput, this will fsync very infrequently and spike latency. (e.g. one client doing a txn every minute or so).

kim · 2026-02-26T17:48:11Z

The idea is to fsync every batch (we did flush each tx before), amortising the cost by batching up to 32 txs.

recv_many may return a single element if there aren’t more in the channel.

cloutiertyler · 2026-02-26T21:54:25Z

The batching approach improves latency under low load (no waiting for timer ticks), but I'm concerned the fixed batch size of 32 couples throughput to fsync latency in a way the current time-based approach doesn't.

With time-based sync, throughput is bounded by disk write speed. With batch-capped sync, throughput is bounded by batch_size / fsync_latency.

Example NVME-ish numbers: Assume 1KB per transaction, 3 GB/s disk throughput, and 1ms fsync latency.

Time-based (current): Sync every 10ms. Disk can write 30MB in that interval = 30,000 tx. Throughput: 3,000,000 tx/sec (disk-limited)
Batch-capped (new): Sync after 32 tx (32KB), fsync takes 1ms. Throughput: 32,000 tx/sec (fsync-latency-limited)

The batch-capped would be ~1% disk utilization under sustained load, whereas I think time-based would be close to 100%. If fsync takes longer (e.g. 3ms), it's more like a theoretical max of 10,000 tx/second. And the smaller the transactions are, the less disk utilization we'd have.

The current implementation also appears to adapt to slow fsync via MissedTickBehavior::Delay, if I'm understanding correctly. If fsync takes longer than the interval, it looks like the next tick fires immediately when fsync completes, so we fsync continuously. But each sync includes all transactions that accumulated during the previous fsync. So in a high transaction throughput scenario, I think removing the 32 transaction bound would have the same behavior as the current logic, but with the added benefit of fsyncing continuously for lower throughput scenarios (rather than waiting for the interval), right?

If the concern is pulling too many transactions off the queue at once, the current implementation already does this, no? It uses an unbounded channel and syncs everything that has accumulated during the interval, with no cap.

recv_many already returns early when fewer than 32 transactions are queued, so removing or significantly increasing the cap would:

Preserve the latency improvement under low load (sync as soon as data arrives)
Not artificially limit throughput under high load (sync everything queued, then continue)

The cap acts like backpressure, but it's not responding to any actual signal (disk saturation, memory pressure). Is there a specific reason to bound it at 32 rather than letting it drain whatever's queued?

cloutiertyler

New changes resolve my throughput concerns.

jsdt · 2026-02-27T16:31:14Z

crates/durability/src/imp/local.rs

-                            clog.flush()?;
+                    tx_buf = spawn_blocking(move || -> io::Result<Vec<Transaction<Txdata<T>>>> {
+                        for tx in tx_buf.drain(..) {
+                            clog.commit([tx])?;


Why do we only commit one at a time, when we may have multiple transactions ready to be committed?

This only appends the commit to a BufWriter, it doesn't translate to a write. That way, if a BufWriter flush results in a torn write, we don't lose the entire batch of transactions (we can't recover partially written commits).

jsdt · 2026-02-27T16:34:26Z

crates/durability/src/imp/local.rs

+                        break;
+                    }
+                    // Reclaim burst capacity.
+                    tx_buf.shrink_to(self.batch_capacity.get());


Nit: We should probably only shrink if n < batch_capacity, so that we aren't constantly allocating and deallocating under high load.

We shrink to the batch capacity -- you mean maybe define a high watermark capacity?

kim requested review from Shubham8287, cloutiertyler and joshua-spacetime February 26, 2026 10:21

fixup! durability: Flush batches

fd5f9e8

recv usize::MAX

b02da11

cloutiertyler approved these changes Feb 27, 2026

View reviewed changes

jsdt reviewed Feb 27, 2026

View reviewed changes

clockwork-labs-bot merged commit d6e2ba5 into kim/commitlog/append-commit-reopen Feb 27, 2026
40 of 50 checks passed

kim deleted the kim/durability/flush-batch branch February 28, 2026 05:21

kim mentioned this pull request Feb 28, 2026

Append commit instead of individual transactions to commitlog #4404

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

durability: Flush batches#4478

durability: Flush batches#4478
clockwork-labs-bot merged 3 commits intokim/commitlog/append-commit-reopenfrom
kim/durability/flush-batch

kim commented Feb 26, 2026

Uh oh!

cloutiertyler commented Feb 26, 2026

Uh oh!

kim commented Feb 26, 2026

Uh oh!

cloutiertyler commented Feb 26, 2026 •

edited

Loading

Uh oh!

cloutiertyler left a comment

Uh oh!

jsdt Feb 27, 2026

Uh oh!

kim Feb 28, 2026

Uh oh!

jsdt Feb 27, 2026

Uh oh!

kim Feb 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kim commented Feb 26, 2026

Uh oh!

cloutiertyler commented Feb 26, 2026

Uh oh!

kim commented Feb 26, 2026

Uh oh!

cloutiertyler commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloutiertyler left a comment

Choose a reason for hiding this comment

Uh oh!

jsdt Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

kim Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

jsdt Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

kim Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cloutiertyler commented Feb 26, 2026 •

edited

Loading