fix(streaming): use idle read_timeout for SSE, not total request timeout#5
Merged
Conversation
reqwest's RequestBuilder::timeout caps the WHOLE request including the streamed body, so SSE/chunked responses were hard-killed after the hardcoded 300s regardless of activity — every SSE stream died at 5 min. Move to a client-level read_timeout (idle, reset on every byte): a healthy stream with periodic keep-alive frames (~10s) never trips it and can run indefinitely; only a genuinely silent upstream is reaped. Bump to 1.0.10.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
SSE/streaming responses were hard-killed after 5 minutes regardless of activity. Root cause:
entrypoint/protocol/streaming_handler.rspasses a hardcoded300toproxy::streaming::forward_streaming, which set it as reqwest'sRequestBuilder::timeout— a total request deadline that includes reading the streamed body. So every SSE stream died at 300s even with continuous data, and thea3s-gateway.io/request-timeoutannotation had no effect on streams.Fix
Move the streaming timeout to a client-level
read_timeout(idle) inproxy/streaming.rsand drop the per-request total timeout.read_timeoutresets on every byte, so a healthy stream with periodic keep-alive frames (the API emits one every ~10s) never trips it and can run indefinitely; only a genuinely silent upstream is reaped after the idle window. (RequestBuilder::read_timeoutdoes not exist in reqwest 0.12;ClientBuilder::read_timeoutdoes — hence the client-level placement.)Non-streaming requests use a different client and are unaffected.
Verification
Built musl 1.0.10 (with
--features kube,redis), deployed to prod gateway + edge, and ran an end-to-end endurance test: a real SSE stream through the gateway survived 340s (68 ticks) vs the old hard cut at 300s. Entry health 6/6 200 throughout.Bumps version 1.0.7 → 1.0.10.