Skip to content

quic: make multiple improvements to packet#62589

Open
jasnell wants to merge 4 commits intonodejs:mainfrom
jasnell:jasnell/quic-packet-improvements
Open

quic: make multiple improvements to packet#62589
jasnell wants to merge 4 commits intonodejs:mainfrom
jasnell:jasnell/quic-packet-improvements

Conversation

@jasnell
Copy link
Copy Markdown
Member

@jasnell jasnell commented Apr 4, 2026

Previously Packets were ReqWrap objects with a shared free-list. This commit changes to a per-Endpoint arena with no v8 involvement. This is the design I originally had in mind but I initially went with the simpler freelist approach to get something working. There's too much overhead in the reqrap/freelist approach and individual packets do not really need to be observable via async hooks.

This design should eliminate the risk of memory fragmentation and eliminate a significant bottleneck in the hot path.

Summary of improvements:

Memory Comparison

Metric Before After Delta
Per-packet memory ~2,140 bytes 1,712 bytes -20%
Heap allocations per acquire 3-4 (Packet, Data, shared_ptr control, V8 object) 0 (pre-allocated in block) eliminated
Heap allocations per reuse (freelist hit) 2 (Data, shared_ptr control) 0 eliminated
V8 heap per packet ~200-400 bytes (JS object) 0 eliminated
Block allocation (128 slots) N/A 214 KB (one new char[]) amortized across 128 acquires
Per-packet allocator overhead ~48-96 bytes (malloc headers × 3-4 allocs) 0 (inline in block) eliminated

Fragmentation

Before: Each packet reuse from the freelist still called std::make_shared(length, label) — a new heap allocation for the Data object + its shared_ptr control block + the std::string diagnostic label. These are small, variably-sized allocations scattered across the heap.

After: All slots are identical 1,712-byte regions within contiguous 214 KB blocks. Zero per-packet heap allocations during steady-state operation. The only allocations happen when a new block is grown.

Performance Comparison

Acquire (hot path — called up to 32× per SendPendingData)

Before (freelist hit):

  1. BindingData::Get(env) — resolve binding data from environment
  2. packet_freelist.front() / pop_front() — std::list dereference (random memory access)
  3. std::make_shared(length, label) — heap allocate Data + control block
  4. std::string constructor for diagnostic label — potential heap allocation
  5. Set listener, destination, data pointer on Packet

Before (freelist miss):

  1. JS_NEW_INSTANCE_OR_RETURN — allocate V8 JS object (GC pressure, potentially triggers GC)
  2. MakeBaseObject(...) — heap allocate Packet
  3. std::make_shared(...) — heap allocate Data + control block
  4. ClearWeak() — modify V8 weak handle state

After (always):

  1. Pop from intrusive free list — slot = free_list_; free_list_ = slot->next_free; (2 pointer ops)
  2. Increment in_use_count_ on block and pool (2 increments)
  3. Placement new Packet in pre-allocated memory (zero-initializes uv_udp_send_t, copies SocketAddress)

The new acquire is essentially 2 pointer operations + a placement new. No heap allocation, no V8 involvement, no atomic operations (shared_ptr control block had atomics).

Release (send callback — every completed packet)

Before:

  1. BaseObjectPtr construction from raw pointer — atomic increment
  2. MakeWeak() — modify V8 weak handle
  3. Check IsDispatched(), call listener
  4. data_.reset() — atomic decrement on shared_ptr, may free Data
  5. Reset() — reset uv_udp_send_t state
  6. packet_freelist.push_back(std::move(self)) — std::list node allocation (!)
  7. Or if freelist full: destroy Packet → V8 GC eventually collects JS object

After:

  1. Packet::FromReq(req) — ContainerOf pointer arithmetic (compile-time offset)
  2. Call listener
  3. ArenaPool::Release(p) — ~Packet() (trivial), then ReleaseSlot:
    • Pointer arithmetic to recover SlotHeader
    • slot->next_free = free_list_; free_list_ = slot; (2 pointer ops)
    • Decrement 2 counters
    • MaybeGC() check (branch, rarely taken)

The new release is pointer arithmetic + 2 pointer operations + 2 decrements. No atomic operations, no heap free, no V8 interaction.

Send path (UDP::Send)

Before: ClearWeak() + Dispatched() + uv_udp_send() + on error: Done() + MakeWeak()

After: Ptr::release() (1 pointer swap) + uv_udp_send() + on error: ArenaPool::Release()

SendPendingData loop (up to 32 packets per call)

Before: Each iteration potentially triggered JS_NEW_INSTANCE_OR_RETURN (V8 object allocation) on freelist miss, plus std::make_shared on every iteration.

After: Each iteration is just a free list pop + placement new. For a full 32-packet burst from a warm pool, this is ~32 × (2 pointer ops + a memset/memcpy for the Packet fields) — essentially zero allocation cost.

GC pressure

Before: Each Packet had a persistent V8 JS object. When the freelist was full (>100 packets), excess packets were destroyed, leaving their V8 objects for the garbage collector. Under high throughput, this created ongoing GC pressure proportional to packet churn.

After: Zero V8 objects. Zero GC pressure from packets. The ArenaPool::MaybeGC() only runs when >50% of total slots are free and only frees entire blocks — a rare bulk operation, not per-packet work.

Summary

Aspect Improvement
Per-packet memory ~20% smaller (1,712 vs ~2,140 bytes)
Heap fragmentation Eliminated (contiguous block allocation)
Heap allocations per acquire 0 (was 2-4)
V8 GC pressure Eliminated entirely
Atomic operations per acquire/release 0 (was 2+ from shared_ptr)
Cache locality Improved (sequential slots in contiguous blocks)
Acquire cost ~2 pointer ops (was: conditional heap alloc + V8 object + shared_ptr)
Release cost ~4 pointer ops + 2 decrements (was: atomic decrement + V8 weak handle + list node alloc)
SendPendingData 32-packet burst ~32 × pointer swap (was: 32 × potential heap alloc + V8 alloc)
Steady-state memory overhead Fixed: 1 block = 214 KB for 128 slots (was: unbounded individual allocations)

The biggest wins are eliminating the per-packet V8 object allocation (which could trigger GC) and the shared_ptr atomic operations on every acquire/release. For a high-throughput QUIC session sending 32 packets per SendPendingData call, the new path is essentially allocation-free after the first block is populated.

Signed-off-by: James M Snell jasnell@gmail.com
Assisted-by: Opencode:Opus 4.6

@jasnell jasnell requested a review from Qard April 4, 2026 18:00
@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/gyp

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 4, 2026
jasnell added 2 commits April 4, 2026 11:04
Previously Packets were ReqWrap objects with a shared
free-list. This commit changes to a per-Endpoint arena
with no v8 involvement. This is the design I originally
had in mind but I initially went with the simpler
freelist approach to get something working. There's
too much overhead in the reqrap/freelist approach and
individual packets do not really need to be observable
via async hooks.

This design should eliminate the risk of memory fragmentation
and eliminate a significant bottleneck in the hot path.

Signed-off-by: James M Snell <jasnell@gmail.com>
Assisted-by: Opencode:Opus 4.6
Handful of additional improvements to the Packet class.

Signed-off-by: James M Snell <jasnell@gmail.com>
Assisted-by: Opencode:Opus 4.6
@jasnell jasnell force-pushed the jasnell/quic-packet-improvements branch from ead61aa to 66d349a Compare April 4, 2026 18:05
Signed-off-by: James M Snell <jasnell@gmail.com>
@jasnell jasnell force-pushed the jasnell/quic-packet-improvements branch from 66d349a to da1e78a Compare April 4, 2026 18:06
@jasnell jasnell requested a review from mcollina April 4, 2026 18:07
Signed-off-by: James M Snell <jasnell@gmail.com>
@codecov

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Copy link
Copy Markdown
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants