Header pooling#7840
Open
danielmarbach wants to merge 7 commits into
Open
Conversation
5197831 to
3987f3c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Pools the
Dictionary<string, string>instances we allocate for message headers. A newDictionaryPool<TKey, TValue>lives inNServiceBus.Utils, andHeaderPoolinNServiceBus.Transportis a thin subclass with header-tuned defaults. We rent on the produce side (send, publish, reply, the audit fork, the multicast copy) and return after dispatch.The learning transport pump rents its incoming headers too, and
HeaderSerializernow populates a rented dictionary instead of allocating one. That needed a manualUtf8JsonReaderparse, becauseSystem.Text.Jsonalways allocates a fresh dictionary and can't fill a rented one.Why
Header dictionaries are a large chunk of what we allocate per message. Reusing them via
Clear()keeps the internal entry and bucket arrays, so the next rent skips the resizes that dominate header-copy traffic.I want to be honest about the shape of the evidence. On a representative run (in-memory transport, one downstream send per message, no fanout inflation): header entry arrays about halved per message, total per-message allocation down ~14%, time in GC down from 11.1% to 9.6%, throughput up ~7%.
I don't think the in-memory harness can tell us what a real endpoint looks like. It has no handler work and no real transport I/O, so it can't represent a handler's allocation mix or the async semantics of a real transport. The framework per-message saving is what's measured; how large a share of a production endpoint's total that is depends on the handler. I'd like us to validate on a real transport before calling this done.
Why ConcurrentStack over ConcurrentBag or an interlocked fast-slot
Three options, briefly:
ConcurrentBag(first attempt). Per-core work-stealing queues backed by a monitor. Once more sites pooled, it spent about 12.7% of CPU inMonitor.Enter_Slowpathand throughput dropped with it. The monitor is the problem.ThreadStatic, or a single interlocked fast-slot with a stack overflow. This is theArrayPool.Sharedpattern, and it doesn't carry over. Rent and return straddle an await: we rent on the pump or handler thread and return on a dispatch continuation thread, so there is no thread-local locality to exploit. AThreadStaticslot can't hand a dictionary back to its renter, and on a real transport it would fragment (returns pile up on dispatch threads, renters allocate fresh). A single interlocked fast-slot is just another hot CAS line every thread hammers, with no locality payoff, so the fast-slot-plus-overflow shape is complexity without benefit here. The in-memory transport hid all of this because its dispatch completes synchronously, which a real transport's async I/O does not.ConcurrentStack(chosen). Lock-free CAS, no monitor. The pool no longer shows up in the monitor profile, and every guarantee carries over from the bag version (soft cap, capacity preserved byClear,TrimExcesson oversized,minimumCapacity/EnsureCapacity, theclearDictionaryopt-out, null guard). One container, same API, same tests.Minor or major?
Most of this is additive.
DictionaryPool<TKey, TValue>,HeaderPool, and theCopyToextension are new public types, so they're safe for a minor.The one real question is the behavioral change. After dispatch, header dictionaries are now cleared and returned to the pool. Before this, they survived dispatch as orphaned objects you could still read. Reading headers after the pipeline completes was never part of the contract: the pipeline owns the header lifecycle for the duration of execution, and nothing in the docs promises the dictionaries stay populated afterward.
My call is minor, with a release note. The reasoning: the only code this affects is code that held a header reference and read it after dispatch, which "happened to work" but was never promised. A release note saying header dictionaries are pooled and cleared after dispatch, and that reading them after the pipeline completes is outside the contract, covers it. I don't think this rises to a major. If we treated any observable change to undocumented behavior as a major, we'd rarely be able to ship anything.
Two smaller things that don't affect the call. The learning transport's
HeaderSerializer.Deserializecan now throwJsonReaderException(a subclass ofJsonException) instead ofJsonExceptionon malformed input. It's internal and dev/test only, and the wire format thatSerializeproduces is unchanged. The manual parser is an internal learning-transport detail, not public surface.Outbox contract worth flagging
ImmediateDispatchTerminatorreturns every dispatched header dictionary to the pool after dispatch, including ones it didn't rent (same semantics asArrayPool). That's fine as long as outboxGethands out dictionaries it owns. A real outbox does, because it deserializes fresh from storage. The in-memory acceptance-testing outbox didn't: it returned shared references, and dispatch cleared and pooled them.AcceptanceTestingOutboxStorage.Getnow copies, the same way the non-durable persistence does.If someone ships an
IOutboxStoragewhoseGetreturns shared header references, it will break under pooling. Worth a line in the outbox contract.What's in the PR
DictionaryPool<TKey, TValue>,HeaderPool,CopyToextension. API approvals updated.MessageOperations,RoutingContextExtensions(multicast),InvokeAuditPipelineBehavior(audit).ImmediateDispatchTerminator,RoutingToDispatchConnector(multicast original).HeaderSerializer.DictionaryPoolTests(9) andHeaderSerializerTests.Open
Not validated on a real transport yet. Happy to drop the learning-pump changes if they feel out of scope here, and happy to revert anything that doesn't hold up. Please don't hesitate to challenge the backing choice or the numbers.