Skip to content

fix(transport): reuse inbound socket for replies in WebSocket fallback (EHOSTUNREACH on stale point-to-point routes)#163

Open
leoburti wants to merge 1 commit into
ruvnet:mainfrom
leoburti:fix/ws-fallback-inbound-reuse
Open

fix(transport): reuse inbound socket for replies in WebSocket fallback (EHOSTUNREACH on stale point-to-point routes)#163
leoburti wants to merge 1 commit into
ruvnet:mainfrom
leoburti:fix/ws-fallback-inbound-reuse

Conversation

@leoburti

@leoburti leoburti commented Jun 4, 2026

Copy link
Copy Markdown

Problem

WebSocketFallbackTransport (the federation transport from #153) always dials a new outbound connection to reach a peer. On a direct point-to-point link (e.g. a macOS Thunderbolt bridge, no tailnet) the kernel's cloned route to the peer can go stale, so every fresh outbound connect fails:

connect EHOSTUNREACH 10.10.10.1:8770 - Local (10.10.10.2:51218)

The OS picks the correct source IP and still reports "no route". Meanwhile the inbound socket from that peer is perfectly usable — WebSocket is full-duplex. There's also no liveness/idle handling: maxIdleTimeoutMs is accepted but never used, and getOrCreateConnection trusts only readyState === OPEN, so a half-open socket on a link that never sends RST is reused blindly.

Full diagnosis in #162.

Fix

  • Index each server-accepted (inbound) socket by peer IP (inboundByHost).
  • In getOrCreateConnection, before dialing: reuse a live outbound (current behavior) → else a live inbound socket for the same peer IP → else dial (unchanged fallback, so no regression when no inbound exists).
  • Add a 5s ping/pong liveness probe + __alive gating (a direct link won't RST a dropped peer, so readyState lags a dead TCP connection by seconds).
  • send() retries once with eviction if the chosen socket dies mid-send.

Because each peer keeps one socket alive via its heartbeat, traffic rides it and no fresh (stale-route-prone) dial is needed. Backward compatible: with no inbound socket present, behavior is identical to before.

Validation

  • Production: two macOS peers over a Thunderbolt bridge — the failing direction went from 0/N to N/N (attempts:1, zero EHOSTUNREACH).
  • New unit test (tests/transport/quic-loader.test.ts): a reply rides the inbound socket — the receiver gets the message while the replier's getStats().created stays 0 (proving reuse, not a dial).
  • Transport suite green (20/20), tsc clean.

Refs #162

The WebSocketFallbackTransport always dials a NEW outbound connection to
reach a peer. On a direct point-to-point link (e.g. a macOS Thunderbolt
bridge) the kernel's cloned route to the peer can go stale, so every
fresh outbound connect fails with `connect EHOSTUNREACH <peer> - Local
(<self>:<ephemeral>)` (the OS picks the correct source and still reports
no route) — while the inbound socket from that peer is perfectly usable.
There was also no liveness/idle handling (`maxIdleTimeoutMs` was accepted
but never used; `getOrCreateConnection` trusted only `readyState===OPEN`).

This makes the transport:
- index each server-accepted (inbound) socket by peer IP (`inboundByHost`)
- in `getOrCreateConnection`, before dialing: reuse a live outbound, else
  a live inbound socket for the same peer IP (WebSocket is full-duplex),
  else dial (unchanged fallback — no regression when no inbound exists)
- add a 5s ping/pong liveness probe + `__alive` gating so a half-open
  socket is detected and skipped (a direct link won't RST a dropped peer)
- retry-once with eviction in `send()` if the chosen socket dies mid-send

Validated in production over a Thunderbolt link (the failing direction
went from 0/N to N/N, attempts:1, zero EHOSTUNREACH) and with a new unit
test asserting the reply rides the inbound socket (`created` stays 0).

Refs ruvnet#162

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant