feat(adk): add subagent middleware with background task support#951
feat(adk): add subagent middleware with background task support#951
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## alpha/09 #951 +/- ##
===========================================
Coverage ? 82.15%
===========================================
Files ? 167
Lines ? 20680
Branches ? 0
===========================================
Hits ? 16989
Misses ? 2498
Partials ? 1193 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4c6d7e9 to
26b0af2
Compare
cb99776 to
ed8d98f
Compare
feat(agentic_model): - format print - support agentic chat template - support to compose agentic odel&agentic tools node - support agentic tool node - support agentic message concat
Change-Id: If20fa78dba82a1c177c8ec47090050ea8c1354ed
* fix(adk): skip saving checkpoint when TurnLoop is idle When Stop() is called on an idle TurnLoop (no active agent run, no unhandled items, no canceled items), the resulting checkpoint contains no meaningful state. Skip saving such checkpoints to avoid unnecessary store writes. - Add isIdle check in cleanup() before checkpoint save decision - Add TestTurnLoop_StopWhileIdle_SkipsCheckpoint test Change-Id: I6aeaff5ed5833a971cb95298193fdb96d904baf8 * fix(internal): merge id2State in PopulateInterruptState instead of replacing PopulateInterruptState merged id2Addr entries one by one but replaced id2State wholesale. In a parallel workflow resume, two goroutines share the same globalResumeInfo. If one goroutine's compose graph called PopulateInterruptState (replacing id2State with compose-only entries) before the other goroutine looked up its outer-level entry, the lookup returned a zero-value InterruptState with State=nil, triggering the 'has no state' panic in ChatModelAgent.Resume. Change id2State handling to merge entry by entry, consistent with id2Addr. Change-Id: Ia21f65289bff7beb2bc383fb033926ad9c92d7e7 * fix(adk): keep watching for cancel escalation after stopSig.done When watchStopSignal entered the stopSig.done branch, it processed the initial cancel and then blocked on <-done (turn completion), never looping back to check notify. This meant a subsequent Stop() call with a higher cancel mode (e.g. CancelImmediate) was never forwarded to the agent, causing TestTurnLoop_Stop_EscalatesCancelMode to time out. Replace the blocking <-done with an inner loop that selects on both done and notify, so escalation signals are always delivered. Also apply the generation-based dedup check consistent with the notify branch. Change-Id: Ia6a04d00a2b44625ffbcb625ff0e559c12ed145f
…r agent cancellation (#929) * fix(adk): prevent panic when orphaned tool goroutine sends event after agent cancellation When CancelAfterChatModel times out and escalates to CancelImmediate, GraphInterrupt fires with timeout=0. The compose graph returns immediately, orphaning parallel tool goroutines. When an orphaned tool completes, eventSenderToolWrapper tries to send an event via the AsyncGenerator which is already closed, causing 'send on closed channel' panic. - Add isImmediateCancelled() to cancelContext for checking immediateChan - Make chatModelAgentExecCtx.send cancel-aware: skip send when immediate cancel is active - Use trySend as safety net for the TOCTOU race window - Route SendEvent() through execCtx.send() instead of direct generator.Send() Change-Id: Ic7e0194c860e2692a3cddc559911ab379024f650 * test(adk): add test for orphaned tool goroutine panic after CancelImmediate - unit_send_after_close: directly reproduces the panic by sending to a closed generator with isImmediateCancelled=true - unit_send_after_close_without_cancel_ctx: verifies trySend safety net prevents panic even without cancelCtx - integration_cancel_escalation_orphans_tool: end-to-end test with slow tool, CancelAfterChatModel timeout escalation, and orphaned goroutine Change-Id: Ia82fa957b102ccc2ac42094d18d4b15db2a1701c * test(adk): improve coverage for orphaned tool goroutine fix Add test cases for: - nil execCtx and nil generator defensive guards - nil cancelContext in isImmediateCancelled - TOCTOU race window (isImmediateCancelled=false but generator closed) - SendEvent public API with closed generator - SendEvent without exec context Change-Id: I197c36f34675f5376cbe5f830b15db6ca873cd1f
…925) * fix(adk): keep late turn loop items Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b * docs(adk): clarify cancel wait semantics Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6 * feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions Add two new StopOption variants for TurnLoop.Stop(): - WithSkipCheckpoint: prevents checkpoint persistence on stop, for cases where the caller does not intend to resume in the future. The flag is sticky across escalation calls. - WithStopCause: attaches a business-supplied reason string. Surfaced in TurnLoopExitState.StopCause and, after the Stopped channel closes, via TurnContext.StopCause(). Uses first-non-empty-wins semantics across multiple Stop() calls. Thread both fields through stopSignal with proper mutex protection. Update cleanup() to skip checkpoint save when skipCheckpoint is set. Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
* fix: rebase error Change-Id: If20fa78dba82a1c177c8ec47090050ea8c1354ed * feat(adk): add failover support for ChatModel Change-Id: Ice1b513b4b509e7b540316da9119ff3d529c9bae * feat(adk): add failover support for ChatModel Change-Id: Ice1b513b4b509e7b540316da9119ff3d529c9bae * feat(adk): add failover support for ChatModel Change-Id: Id5483447b74322f6dd495bdd3b994c001094569d * feat(adk): make Name and Description optional in ChatModelAgentConfig * feat(adk): add callback lifecycle management to failoverProxyModel - Extract prepareCallbacks method to reuse callback setup logic between Generate and Stream methods - Add callbacks.ReuseHandlers with proper RunInfo (model type + component) before each failover model invocation so handlers receive correct identity - Add explicit OnStart/OnEnd/OnError callback invocations in Generate and Stream since failoverProxyModel declares IsCallbacksEnabled() = true and the outer layer skips automatic callback injection Change-Id: I0150529024125251828cf6f77c8247aa464b1f84 * fix(adk): preserve partial result in failoverProxyModel.Generate on error Return result instead of nil when target.Generate fails, so that the outer failoverModelWrapper can pass the partial output message to ShouldFailover for inspection. Change-Id: I32d86151a6e133f1a58d5e988bccf42d831a646c * refactor(adk): use EnsureRunInfo in failoverProxyModel and separate ctx for callbacks - Replace manual RunInfo construction + ReuseHandlers with callbacks.EnsureRunInfo for cleaner RunInfo setup - Use nCtx (from EnsureRunInfo) for target model invocation and original ctx for OnStart/OnEnd/OnError callback lifecycle Change-Id: I1d5982d0e1ceeaf8f6648b9c40c229b6a2b07ab8 --------- Co-authored-by: shentong.martin <shentong.martin@bytedance.com>
feat: tool search definition
…945) - Add ToolAliases to prepareExecContext when building ToolsNodeConfig - Add UnknownToolsHandler, ExecuteSequentially, ToolArgumentsHandler, and ToolAliases to applyBeforeAgent when rebuilding after BeforeAgent handlers modify tools - Add tests covering argument alias remapping, name alias dispatch, alias preservation after handler rebuild, and handler-only tool registration with pre-configured aliases
…options, add UntilIdleFor (#942)
f20ee8f to
9661935
Compare
…esign Merge the task management layer (previously adk/taskstate/) into adk/middlewares/subagent/, since TaskMgr is inherently Agent-aware: it owns execution (Run), resolves agents by name, and manages foreground/background/auto-background switching. Keeping it in a separate generic package added indirection without real reuse. Key design decisions: - TaskMgr is a concrete type, not an interface. The old Manager interface (Register+Handle closures) is replaced by a single Run(*RunInput) method that accepts an agent name and delegates internally. This eliminates the split-ownership problem where the caller had to wire up Handle.Complete/Fail callbacks manually. - TaskMgr owns an agent registry (RegisterAgent) so it can resolve agents by name in Run(). This lays groundwork for future interrupt/resume support where TaskMgr needs to re-invoke agents. - TaskStore abstraction is deferred. The interface was defined but never consumed — no code path reads from or writes to it. Rather than ship an unvalidated abstraction, we remove it and will introduce it when interrupt/resume or session persistence actually needs it. At that point the requirements will be concrete enough to design a correct interface on the first try. - Removed recursion prevention (subagentCtxKey). The middleware is instantiated per Agent; whether sub-agents get their own sub-agent tools is the user's choice via their middleware config. The old context-marker guard prevented legitimate nesting. Other changes: - agentInput: added prompt field (task content) alongside description (task title), aligned with Claude Code's Agent tool schema. Backward compat: when prompt is empty, falls back to description to handle fewshot hallucination from the old single-field layout. - Unified agentInput struct regardless of enableRunInBackground. When background is not enabled and the model hallucinates run_in_background=true, returns a system-reminder guiding re-invocation instead of silently running in foreground. - Config field renames for clarity: AgentToolName→ToolName, CustomSystemPrompt→SystemPrompt (*string, since empty string is a valid intentional override), TaskToolDescriptionGenerator→ToolDescriptionGenerator. - StatusCancelled→StatusCanceled (American English, consistent with Go stdlib context.Canceled). - Task.IsBackground→Task.RunInBackground, RunInput.Background→RunInput.RunInBackground for naming consistency across the API surface. - Chinese prompts: replaced "代理" (proxy) with "智能体" (intelligent agent) throughout — the standard term in Chinese AI/LLM context. - Prompt refresh aligned with Claude Code reference: added "Writing the prompt" section teaching agents how to brief sub-agents effectively, added anti-polling guidance for background tasks, removed low-value "Subagent lifecycle" section. - task_output_tool and task_stop_tool rewritten with utils.InferTool for automatic JSON schema inference, reducing boilerplate. - Replaced encoding/json with sonic for marshal/unmarshal. - Fixed cherry-pick oversight: deep.go now passes cfg.ModelFailoverConfig to ChatModelAgentConfig instead of hardcoded nil. Deletes adk/taskstate/ entirely (manager.go, memory.go, memory_test.go). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ution When auto-background is not enabled and the caller requests foreground execution (the default path), run the agent inline instead of spawning a goroutine and blocking on a channel. The goroutine+channel pattern is only needed when we might not wait for the result (explicit background or auto-background timeout). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
418aaa2 to
e7af439
Compare
| // When mgr is nil, agent runs are executed directly in the foreground with no tracking. | ||
| type agentTool struct { | ||
| name string | ||
| subAgents map[string]tool.InvokableTool // for non-TaskMgr foreground path |
There was a problem hiding this comment.
agent 转换成 tool,默认是invokeable tool,目前已经支持Enhancetool场景下,是不是应该支持下 Enhanced Tool的转换,可能subagent 执行最终返回的是多模态message ;
| // the Message itself or Chunks of the MessageStream, as they are not copied. | ||
| // NOTE: if you have CustomizedOutput or CustomizedAction, they are NOT copied. | ||
| func copyAgentEvent(ae *AgentEvent) *AgentEvent { | ||
| func CopyAgentEvent(ae *AgentEvent) *AgentEvent { |
There was a problem hiding this comment.
只是内部用的话,放到 adk/internal 里面?
| sts := ae.Output.MessageOutput.MessageStream.Copy(2) | ||
| mv.MessageStream = sts[0] | ||
| copied.Output.MessageOutput.MessageStream = sts[1] | ||
| setAutomaticClose(ae) |
There was a problem hiding this comment.
放到 internal 的话,这个 setAutomaticClose 也可以不强制耦合 copyAgentEvent 了?
| func buildSubAgentsList(ctx context.Context, cfg *Config, instruction string, handlers []adk.ChatModelAgentMiddleware) ([]adk.Agent, error) { | ||
| var allSubAgents []adk.Agent | ||
|
|
||
| if !cfg.WithoutGeneralSubAgent { |
There was a problem hiding this comment.
用户怎么扩展其他类型的 subAgent?
| ToolsConfig: cfg.ToolsConfig, | ||
| MaxIterations: cfg.MaxIteration, | ||
| Middlewares: cfg.Middlewares, | ||
| Handlers: append(handlers, cfg.Handlers...), |
There was a problem hiding this comment.
subAgent 和 parent deepAgent 的 middleware 一定是完全一致的吗
| return utils.InferTool(taskOutputToolName, desc, func(ctx context.Context, input taskOutputInput) (string, error) { | ||
| task, ok := mgr.Get(input.TaskID) | ||
| if !ok { | ||
| return fmt.Sprintf("Task %q not found", input.TaskID), nil |
| // Result is the agent's output when Status is StatusCompleted. | ||
| Result string | ||
| // Error is the error message when Status is StatusFailed. | ||
| Error string |
| // Using a struct instead of positional arguments for extensibility (e.g., future interrupt/resume fields). | ||
| type RunInput struct { | ||
| // SubagentType is the name of the registered agent to execute. | ||
| SubagentType string |
There was a problem hiding this comment.
这个 SubAgentType 现在是不是只有 General 一种
| // AutoBackgroundMs sets the automatic foreground-to-background switching timeout. | ||
| // When > 0, a foreground agent run that hasn't completed within this many | ||
| // milliseconds will automatically switch to background mode. | ||
| // When 0, auto-background is disabled (foreground runs block indefinitely). |
There was a problem hiding this comment.
"0 的意思是 auto-background 禁用“,还是 nil 更好?
| // It owns both the execution (Run) and the state tracking (Get/List/Cancel). | ||
| // | ||
| // TaskMgr is Agent-aware: it maintains a registry of agents (via RegisterAgent), | ||
| // resolves them by name in Run, and manages foreground/background/auto-background |
There was a problem hiding this comment.
‘by name',这个 name 其实是固定的?比如如何并行 launch 多个 ’general‘ agent?
There was a problem hiding this comment.
好像 in-background sub-agent 的 notification 如何影响 parent agent 执行还没实现?
9661935 to
d7161f7
Compare
Summary
Adds the
adk/middlewares/subagentpackage — aChatModelAgentMiddlewarethat gives any agent the ability to spawn and manage sub-agents via tool calls. This extracts and generalizes the sub-agent capability that was previously hardcoded inside the DeepAgent prebuilt, making it available as a composable middleware for anyChatModelAgent.The middleware injects up to three tools into the agent's context:
TaskMgris setTaskMgris setArchitecture
Two operating modes
Foreground-only (default): Set
Config.SubAgentsand the middleware injects an Agent tool. Sub-agents run synchronously — the tool call blocks until the sub-agent returns. No task tracking, no goroutine overhead. This is the simple path for agents that just need to delegate work.With TaskMgr: Set
Config.TaskMgrto enable full task lifecycle management. All agent runs (foreground and background) are tracked and visible viaTaskMgr.Get/List/Notifications. The Agent tool gains arun_in_backgroundparameter, andTaskOutput/TaskStoptools are injected automatically.TaskMgr
TaskMgris a concrete type (not an interface) that owns agent execution end-to-end. It maintains an agent registry, resolves agents by name, and manages three execution strategies via a singleRun(*RunInput)entry point:RunInBackground: true) — spawns a goroutine, returns immediately withStatusRunningWithAutoBackground(ms)) — starts in a goroutine, waits up to the timeout; if the agent finishes in time it returns the result synchronously, otherwise it returnsStatusRunningand the agent continues in backgroundThe developer-facing API provides
Get,List,Cancel,HasRunning,WaitAllDone,Notifications, andClosefor managing the task lifecycle from the outer loop.Agent tool schema
Aligned with Claude Code's Agent tool:
{ "subagent_type": "researcher", "prompt": "Find all usages of the deprecated API and list them with file paths", "description": "Find deprecated API usages", "run_in_background": false }promptcarries the full task content;descriptionis a short 3-5 word titlepromptis empty, falls back todescriptionto handle fewshot hallucination from models trained on the old single-field layoutrun_in_background=trueon a config withoutTaskMgr, returns a<system-reminder>guiding re-invocation instead of failing silentlySystem prompt
The injected system prompt covers when to use/not use the Agent tool, parallelization guidance, and a "Writing the prompt" section teaching the agent how to brief sub-agents effectively — aligned with the Claude Code reference.
English and Chinese versions are provided via
internal.SelectPrompt, with Chinese using "智能体" (the standard Chinese AI term) instead of "代理".DeepAgent integration
DeepAgent's hardcoded
task_tool.gomiddleware is replaced withsubagent.New(), reducing ~100 lines of duplicated tool/middleware logic to a simple config:Design decisions
TaskMgr is concrete, not an interface. There's one implementation and no realistic second consumer. A concrete type keeps the API simple (no Handle closures, no callback wiring). If a genuine second implementation appears later, we can extract an interface at that point.
No TaskStore abstraction yet. Persistent storage (for interrupt/resume, session state) was considered but deferred. Without concrete read/write requirements, we'd risk locking in the wrong contract. When interrupt/resume or session persistence lands, the requirements will drive the interface design.
No recursion prevention. The middleware is instantiated per Agent. Whether sub-agents get their own nested sub-agents is the user's architectural choice via their middleware config — not something the middleware should silently block.
Foreground path runs inline. When auto-background is not enabled and the caller requests foreground execution, the agent runs directly in the calling goroutine — no channel, no goroutine overhead. The goroutine+channel pattern is only used when we might not wait for the result.
Files
subagent/middleware.goBeforeAgenthooksubagent/agent_tool.goTaskMgr.Runsubagent/task_mgr.gosubagent/prompt.gosubagent/task_output_tool.goutils.InferToolsubagent/task_stop_tool.goutils.InferToolsubagent/middleware_test.gosubagent/task_mgr_test.godeep/deep.gosubagent.New()instead of hardcoded task tool middlewareTest plan
go test -race ./adk/middlewares/subagent/...— 43 tests passgo test -race ./adk/prebuilt/deep/...— DeepAgent tests passgo build ./adk/...— full ADK compiles clean🤖 Generated with Claude Code