Skip to content

fix: ensure client keepalive < server keepalive to avoid client keepalive desync errors#1555

Open
ananthsub wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
ananthsub:ananthsub/client-server-keepalive-timeouts
Open

fix: ensure client keepalive < server keepalive to avoid client keepalive desync errors#1555
ananthsub wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
ananthsub:ananthsub/client-server-keepalive-timeouts

Conversation

@ananthsub

Copy link
Copy Markdown
Contributor

uvicorn closes idle keep-alive connections after 5s by default (ref)

while the aiohttp client keeps them pooled for 15s by default (ref), causing ServerDisconnectedError when the client reuses a socket the server already closed.

This PR set the client TCPConnector keepalive_timeout to 15s and the uvicorn timeout_keep_alive to 30s so pooled sockets are guaranteed live when reused.

cmunley1
cmunley1 previously approved these changes Jun 10, 2026
Comment thread nemo_gym/server_utils.py Outdated
@ananthsub ananthsub force-pushed the ananthsub/client-server-keepalive-timeouts branch from b7a8b6e to b04120b Compare June 10, 2026 08:37
@ananthsub ananthsub requested a review from cmunley1 June 10, 2026 09:10
@ananthsub ananthsub force-pushed the ananthsub/client-server-keepalive-timeouts branch from b04120b to 2235ecc Compare June 11, 2026 03:34
@copy-pr-bot

copy-pr-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…nectedError

uvicorn closes idle keep-alive connections after 5s by default while the
aiohttp client keeps them pooled for 15s, causing ServerDisconnectedError
when the client reuses a socket the server already closed. Set the client
TCPConnector keepalive_timeout to 15s and the uvicorn timeout_keep_alive to
30s so pooled sockets are guaranteed live when reused.

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
@ananthsub ananthsub force-pushed the ananthsub/client-server-keepalive-timeouts branch from 2235ecc to 46c9356 Compare June 16, 2026 17:08
@ananthsub ananthsub enabled auto-merge (squash) June 16, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants