Skip to content

Header pooling#7840

Open
danielmarbach wants to merge 7 commits into
masterfrom
header-pooling
Open

Header pooling#7840
danielmarbach wants to merge 7 commits into
masterfrom
header-pooling

Conversation

@danielmarbach

@danielmarbach danielmarbach commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What this does

Pools the Dictionary<string, string> instances we allocate for message headers. A new DictionaryPool<TKey, TValue> lives in NServiceBus.Utils, and HeaderPool in NServiceBus.Transport is a thin subclass with header-tuned defaults. We rent on the produce side (send, publish, reply, the audit fork, the multicast copy) and return after dispatch.

The learning transport pump rents its incoming headers too, and HeaderSerializer now populates a rented dictionary instead of allocating one. That needed a manual Utf8JsonReader parse, because System.Text.Json always allocates a fresh dictionary and can't fill a rented one.

Why

Header dictionaries are a large chunk of what we allocate per message. Reusing them via Clear() keeps the internal entry and bucket arrays, so the next rent skips the resizes that dominate header-copy traffic.

I want to be honest about the shape of the evidence. On a representative run (in-memory transport, one downstream send per message, no fanout inflation): header entry arrays about halved per message, total per-message allocation down ~14%, time in GC down from 11.1% to 9.6%, throughput up ~7%.

I don't think the in-memory harness can tell us what a real endpoint looks like. It has no handler work and no real transport I/O, so it can't represent a handler's allocation mix or the async semantics of a real transport. The framework per-message saving is what's measured; how large a share of a production endpoint's total that is depends on the handler. I'd like us to validate on a real transport before calling this done.

Why ConcurrentStack over ConcurrentBag or an interlocked fast-slot

Three options, briefly:

  • ConcurrentBag (first attempt). Per-core work-stealing queues backed by a monitor. Once more sites pooled, it spent about 12.7% of CPU in Monitor.Enter_Slowpath and throughput dropped with it. The monitor is the problem.
  • A thread-affine slot: ThreadStatic, or a single interlocked fast-slot with a stack overflow. This is the ArrayPool.Shared pattern, and it doesn't carry over. Rent and return straddle an await: we rent on the pump or handler thread and return on a dispatch continuation thread, so there is no thread-local locality to exploit. A ThreadStatic slot can't hand a dictionary back to its renter, and on a real transport it would fragment (returns pile up on dispatch threads, renters allocate fresh). A single interlocked fast-slot is just another hot CAS line every thread hammers, with no locality payoff, so the fast-slot-plus-overflow shape is complexity without benefit here. The in-memory transport hid all of this because its dispatch completes synchronously, which a real transport's async I/O does not.
  • ConcurrentStack (chosen). Lock-free CAS, no monitor. The pool no longer shows up in the monitor profile, and every guarantee carries over from the bag version (soft cap, capacity preserved by Clear, TrimExcess on oversized, minimumCapacity/EnsureCapacity, the clearDictionary opt-out, null guard). One container, same API, same tests.

Minor or major?

Most of this is additive. DictionaryPool<TKey, TValue>, HeaderPool, and the CopyTo extension are new public types, so they're safe for a minor.

The one real question is the behavioral change. After dispatch, header dictionaries are now cleared and returned to the pool. Before this, they survived dispatch as orphaned objects you could still read. Reading headers after the pipeline completes was never part of the contract: the pipeline owns the header lifecycle for the duration of execution, and nothing in the docs promises the dictionaries stay populated afterward.

My call is minor, with a release note. The reasoning: the only code this affects is code that held a header reference and read it after dispatch, which "happened to work" but was never promised. A release note saying header dictionaries are pooled and cleared after dispatch, and that reading them after the pipeline completes is outside the contract, covers it. I don't think this rises to a major. If we treated any observable change to undocumented behavior as a major, we'd rarely be able to ship anything.

Two smaller things that don't affect the call. The learning transport's HeaderSerializer.Deserialize can now throw JsonReaderException (a subclass of JsonException) instead of JsonException on malformed input. It's internal and dev/test only, and the wire format that Serialize produces is unchanged. The manual parser is an internal learning-transport detail, not public surface.

Outbox contract worth flagging

ImmediateDispatchTerminator returns every dispatched header dictionary to the pool after dispatch, including ones it didn't rent (same semantics as ArrayPool). That's fine as long as outbox Get hands out dictionaries it owns. A real outbox does, because it deserializes fresh from storage. The in-memory acceptance-testing outbox didn't: it returned shared references, and dispatch cleared and pooled them. AcceptanceTestingOutboxStorage.Get now copies, the same way the non-durable persistence does.

If someone ships an IOutboxStorage whose Get returns shared header references, it will break under pooling. Worth a line in the outbox contract.

What's in the PR

  • New public API: DictionaryPool<TKey, TValue>, HeaderPool, CopyTo extension. API approvals updated.
  • Rent sites: MessageOperations, RoutingContextExtensions (multicast), InvokeAuditPipelineBehavior (audit).
  • Return sites: ImmediateDispatchTerminator, RoutingToDispatchConnector (multicast original).
  • Learning transport pump and HeaderSerializer.
  • DictionaryPoolTests (9) and HeaderSerializerTests.

Open

Not validated on a real transport yet. Happy to drop the learning-pump changes if they feel out of scope here, and happy to revert anything that doesn't hold up. Please don't hesitate to challenge the backing choice or the numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant