fix(synth): stream the synthesis call + idle-token deadline (#104)#106
Merged
Conversation
factual-lookup's synthesis generates a large (~8k-token) answer that takes ~110-150s. The non-streaming client waits for the whole response under a 120s whole-call timeout, so it intermittently timed out mid-generation and re-ran the full generation 3x (~360s) before failing. That is the #104 wedge — it surfaced as "0 sources / aborted due to timeout", but the fetch stage was a red herring: the run had its sources and was in synth. Route synthesis through the streaming client even in non-TTY/--json mode (callLLMStream accumulates and returns when there's no onToken). The streaming client bounds only the connect by the timeout, so a long-but-healthy stream completes in one pass (~130s, ~3x faster than the old slow path). Add an idle-token deadline to the stream read (parseSSE idleMs): no token for timeoutMs cancels the stream and throws, so a genuine stall fails fast instead of hanging to the global --max-runtime (or forever, when it's unset) — closing a pre-existing gap in the interactive path too. Retry still wraps the connect only. A mid-stream synthesis stall is, empirically (#104), a persistent upstream condition that re-issuing doesn't recover from, so re-streaming just burns 3x the wall-clock; we fail fast instead. Tests: parseSSE idle-timeout (stall -> TimeoutError; prompt stream -> no false timeout) + a mid-stream-stall-surfaces test; the agent-loop mock now speaks SSE for streaming synth calls. Full suite green. Validated e2e through dario: factual-lookup now passes in ~130s (was ~408s on the slow path / ~50% whole-call timeout). A residual intermittent upstream SSE stall remains (~1 in 4) — tracked in #104; likely round-trip/tunnel aggravated, to be retested with deepdive running in-network next to dario.
This was referenced Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Synthesis now streams (even in
--json/non-TTY mode) and the stream is bounded by an idle-token deadline. Fixes the dominant #104 failure mode.Root cause (the #104 "fetch wedge" was a red herring)
A live
--verbosetrace caught it:factual-lookupfetched its sources fine, then aborted in synthesis. Its answer is large (~8k tokens, ~110–150s to generate). The non-streaming client (callLLM) waits for the whole response under a 120s whole-call timeout, so it intermittently timed out mid-generation and re-ran the full generation 3× (~360s) before failing — surfacing as "0 sources / aborted due to timeout" (0 sources only because the run died before emitting the JSON envelope).Fix
synthesize.ts— always usecallLLMStream(it accumulates and returns when there's noonToken). The streaming client bounds only the connect by the timeout, so a long-but-healthy stream completes in one pass (~130s, ~3× faster) instead of tripping the whole-call cap.llm-stream.ts— add an idle-token deadline to the stream read (parseSSEidleMs): no token fortimeoutMscancels the stream and throws, so a genuine stall fails fast instead of hanging to the global--max-runtime(or forever, when unset — closing a pre-existing gap in the interactive path). Retry still wraps the connect only; a mid-stream synthesis stall is empirically a persistent upstream condition, so re-issuing just burns 3× the wall-clock.Tests
parseSSE: idle timeout aborts a stalled stream →TimeoutError; a prompt stream withidleMsset completes with no false timeout.callLLMStream: a mid-stream stall surfaces (connect-only retry, no silent re-issue).agent-loopmock now emits SSE for streaming synth calls.Validation (e2e through dario)
factual-lookupnow passes in ~130s (was ~408s on the slow path / ~50% whole-call timeout).Honest residual: an intermittent upstream SSE stall remains (~1 in 4 runs) — the stream stops mid-generation and the idle deadline fails it fast. Adding whole-call retry for this case was tried and reverted: the stall persists across the retry window, so it only made failures slower (~400s) with no better pass rate. This residual is tracked in #104 and is likely round-trip/tunnel aggravated (these runs reach dario over an SSH tunnel); it'll be retested with deepdive running in-network next to dario.