Skip to content

feat(adk): add subagent middleware with background task support#951

Open
hi-pender wants to merge 61 commits intoalpha/09from
feat/subagent
Open

feat(adk): add subagent middleware with background task support#951
hi-pender wants to merge 61 commits intoalpha/09from
feat/subagent

Conversation

@hi-pender
Copy link
Copy Markdown
Contributor

@hi-pender hi-pender commented Apr 12, 2026

Summary

Adds the adk/middlewares/subagent package — a ChatModelAgentMiddleware that gives any agent the ability to spawn and manage sub-agents via tool calls. This extracts and generalizes the sub-agent capability that was previously hardcoded inside the DeepAgent prebuilt, making it available as a composable middleware for any ChatModelAgent.

The middleware injects up to three tools into the agent's context:

Tool Always injected Description
Agent Yes Spawns a sub-agent by name with a prompt and description
TaskOutput When TaskMgr is set Retrieves a background task's status and result
TaskStop When TaskMgr is set Cancels a running background task

Architecture

Two operating modes

Foreground-only (default): Set Config.SubAgents and the middleware injects an Agent tool. Sub-agents run synchronously — the tool call blocks until the sub-agent returns. No task tracking, no goroutine overhead. This is the simple path for agents that just need to delegate work.

With TaskMgr: Set Config.TaskMgr to enable full task lifecycle management. All agent runs (foreground and background) are tracked and visible via TaskMgr.Get/List/Notifications. The Agent tool gains a run_in_background parameter, and TaskOutput/TaskStop tools are injected automatically.

TaskMgr

TaskMgr is a concrete type (not an interface) that owns agent execution end-to-end. It maintains an agent registry, resolves agents by name, and manages three execution strategies via a single Run(*RunInput) entry point:

  • Foreground — runs inline, returns when done (no goroutine)
  • Background (RunInBackground: true) — spawns a goroutine, returns immediately with StatusRunning
  • Auto-background (WithAutoBackground(ms)) — starts in a goroutine, waits up to the timeout; if the agent finishes in time it returns the result synchronously, otherwise it returns StatusRunning and the agent continues in background

The developer-facing API provides Get, List, Cancel, HasRunning, WaitAllDone, Notifications, and Close for managing the task lifecycle from the outer loop.

Agent tool schema

Aligned with Claude Code's Agent tool:

{
  "subagent_type": "researcher",
  "prompt": "Find all usages of the deprecated API and list them with file paths",
  "description": "Find deprecated API usages",
  "run_in_background": false
}
  • prompt carries the full task content; description is a short 3-5 word title
  • Backward compatible: when prompt is empty, falls back to description to handle fewshot hallucination from models trained on the old single-field layout
  • When the model hallucinates run_in_background=true on a config without TaskMgr, returns a <system-reminder> guiding re-invocation instead of failing silently

System prompt

The injected system prompt covers when to use/not use the Agent tool, parallelization guidance, and a "Writing the prompt" section teaching the agent how to brief sub-agents effectively — aligned with the Claude Code reference.

English and Chinese versions are provided via internal.SelectPrompt, with Chinese using "智能体" (the standard Chinese AI term) instead of "代理".

DeepAgent integration

DeepAgent's hardcoded task_tool.go middleware is replaced with subagent.New(), reducing ~100 lines of duplicated tool/middleware logic to a simple config:

subagent.New(ctx, &subagent.Config{
    SubAgents:                allSubAgents,
    ToolName:                 taskToolName,
    ToolDescriptionGenerator: cfg.TaskToolDescriptionGenerator,
})

Design decisions

TaskMgr is concrete, not an interface. There's one implementation and no realistic second consumer. A concrete type keeps the API simple (no Handle closures, no callback wiring). If a genuine second implementation appears later, we can extract an interface at that point.

No TaskStore abstraction yet. Persistent storage (for interrupt/resume, session state) was considered but deferred. Without concrete read/write requirements, we'd risk locking in the wrong contract. When interrupt/resume or session persistence lands, the requirements will drive the interface design.

No recursion prevention. The middleware is instantiated per Agent. Whether sub-agents get their own nested sub-agents is the user's architectural choice via their middleware config — not something the middleware should silently block.

Foreground path runs inline. When auto-background is not enabled and the caller requests foreground execution, the agent runs directly in the calling goroutine — no channel, no goroutine overhead. The goroutine+channel pattern is only used when we might not wait for the result.

Files

File Description
subagent/middleware.go Config, validation, middleware initialization, BeforeAgent hook
subagent/agent_tool.go Agent tool implementation — routes to direct invocation or TaskMgr.Run
subagent/task_mgr.go TaskMgr — agent registry, task lifecycle, foreground/background/auto-background
subagent/prompt.go System prompts and tool descriptions (English + Chinese)
subagent/task_output_tool.go TaskOutput tool via utils.InferTool
subagent/task_stop_tool.go TaskStop tool via utils.InferTool
subagent/middleware_test.go Middleware + tool integration tests (19 tests)
subagent/task_mgr_test.go TaskMgr unit tests (24 tests)
deep/deep.go Migrated to use subagent.New() instead of hardcoded task tool middleware

Test plan

  • go test -race ./adk/middlewares/subagent/... — 43 tests pass
  • go test -race ./adk/prebuilt/deep/... — DeepAgent tests pass
  • go build ./adk/... — full ADK compiles clean

🤖 Generated with Claude Code

@hi-pender hi-pender changed the title refactor(adk): consolidate TaskMgr into subagent package with API redesign feat(adk): add subagent middleware with background task support Apr 12, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 85.74561% with 65 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (alpha/09@9661935). Learn more about missing BASE report.

Files with missing lines Patch % Lines
adk/prebuilt/deep/deep.go 33.33% 22 Missing and 2 partials ⚠️
adk/middlewares/subagent/agent_tool.go 74.68% 14 Missing and 6 partials ⚠️
adk/middlewares/subagent/task_mgr.go 95.23% 7 Missing and 4 partials ⚠️
adk/middlewares/subagent/middleware.go 88.73% 4 Missing and 4 partials ⚠️
adk/middlewares/subagent/task_output_tool.go 90.47% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             alpha/09     #951   +/-   ##
===========================================
  Coverage            ?   82.15%           
===========================================
  Files               ?      167           
  Lines               ?    20680           
  Branches            ?        0           
===========================================
  Hits                ?    16989           
  Misses              ?     2498           
  Partials            ?     1193           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mrh997 and others added 24 commits April 14, 2026 14:50
feat(agentic_model):
- format print
- support agentic chat template
- support to compose agentic odel&agentic tools node
- support agentic tool node
- support agentic message concat
meguminnnnnnnnn and others added 18 commits April 14, 2026 14:50
Change-Id: If20fa78dba82a1c177c8ec47090050ea8c1354ed
* fix(adk): skip saving checkpoint when TurnLoop is idle

When Stop() is called on an idle TurnLoop (no active agent run, no
unhandled items, no canceled items), the resulting checkpoint contains
no meaningful state. Skip saving such checkpoints to avoid unnecessary
store writes.

- Add isIdle check in cleanup() before checkpoint save decision
- Add TestTurnLoop_StopWhileIdle_SkipsCheckpoint test

Change-Id: I6aeaff5ed5833a971cb95298193fdb96d904baf8

* fix(internal): merge id2State in PopulateInterruptState instead of replacing

PopulateInterruptState merged id2Addr entries one by one but replaced
id2State wholesale. In a parallel workflow resume, two goroutines share
the same globalResumeInfo. If one goroutine's compose graph called
PopulateInterruptState (replacing id2State with compose-only entries)
before the other goroutine looked up its outer-level entry, the lookup
returned a zero-value InterruptState with State=nil, triggering the
'has no state' panic in ChatModelAgent.Resume.

Change id2State handling to merge entry by entry, consistent with
id2Addr.

Change-Id: Ia21f65289bff7beb2bc383fb033926ad9c92d7e7

* fix(adk): keep watching for cancel escalation after stopSig.done

When watchStopSignal entered the stopSig.done branch, it processed the
initial cancel and then blocked on <-done (turn completion), never
looping back to check notify. This meant a subsequent Stop() call with
a higher cancel mode (e.g. CancelImmediate) was never forwarded to the
agent, causing TestTurnLoop_Stop_EscalatesCancelMode to time out.

Replace the blocking <-done with an inner loop that selects on both
done and notify, so escalation signals are always delivered. Also apply
the generation-based dedup check consistent with the notify branch.

Change-Id: Ia6a04d00a2b44625ffbcb625ff0e559c12ed145f
…r agent cancellation (#929)

* fix(adk): prevent panic when orphaned tool goroutine sends event after agent cancellation

When CancelAfterChatModel times out and escalates to CancelImmediate,
GraphInterrupt fires with timeout=0. The compose graph returns immediately,
orphaning parallel tool goroutines. When an orphaned tool completes,
eventSenderToolWrapper tries to send an event via the AsyncGenerator which
is already closed, causing 'send on closed channel' panic.

- Add isImmediateCancelled() to cancelContext for checking immediateChan
- Make chatModelAgentExecCtx.send cancel-aware: skip send when immediate cancel is active
- Use trySend as safety net for the TOCTOU race window
- Route SendEvent() through execCtx.send() instead of direct generator.Send()

Change-Id: Ic7e0194c860e2692a3cddc559911ab379024f650

* test(adk): add test for orphaned tool goroutine panic after CancelImmediate

- unit_send_after_close: directly reproduces the panic by sending to a
  closed generator with isImmediateCancelled=true
- unit_send_after_close_without_cancel_ctx: verifies trySend safety net
  prevents panic even without cancelCtx
- integration_cancel_escalation_orphans_tool: end-to-end test with slow
  tool, CancelAfterChatModel timeout escalation, and orphaned goroutine

Change-Id: Ia82fa957b102ccc2ac42094d18d4b15db2a1701c

* test(adk): improve coverage for orphaned tool goroutine fix

Add test cases for:
- nil execCtx and nil generator defensive guards
- nil cancelContext in isImmediateCancelled
- TOCTOU race window (isImmediateCancelled=false but generator closed)
- SendEvent public API with closed generator
- SendEvent without exec context

Change-Id: I197c36f34675f5376cbe5f830b15db6ca873cd1f
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
* fix: rebase error

Change-Id: If20fa78dba82a1c177c8ec47090050ea8c1354ed

* feat(adk): add failover support for ChatModel

Change-Id: Ice1b513b4b509e7b540316da9119ff3d529c9bae

* feat(adk): add failover support for ChatModel

Change-Id: Ice1b513b4b509e7b540316da9119ff3d529c9bae

* feat(adk): add failover support for ChatModel

Change-Id: Id5483447b74322f6dd495bdd3b994c001094569d

* feat(adk): make Name and Description optional in ChatModelAgentConfig

* feat(adk): add callback lifecycle management to failoverProxyModel

- Extract prepareCallbacks method to reuse callback setup logic between
  Generate and Stream methods
- Add callbacks.ReuseHandlers with proper RunInfo (model type + component)
  before each failover model invocation so handlers receive correct identity
- Add explicit OnStart/OnEnd/OnError callback invocations in Generate and
  Stream since failoverProxyModel declares IsCallbacksEnabled() = true and
  the outer layer skips automatic callback injection

Change-Id: I0150529024125251828cf6f77c8247aa464b1f84

* fix(adk): preserve partial result in failoverProxyModel.Generate on error

Return result instead of nil when target.Generate fails, so that the
outer failoverModelWrapper can pass the partial output message to
ShouldFailover for inspection.

Change-Id: I32d86151a6e133f1a58d5e988bccf42d831a646c

* refactor(adk): use EnsureRunInfo in failoverProxyModel and separate ctx for callbacks

- Replace manual RunInfo construction + ReuseHandlers with
  callbacks.EnsureRunInfo for cleaner RunInfo setup
- Use nCtx (from EnsureRunInfo) for target model invocation and
  original ctx for OnStart/OnEnd/OnError callback lifecycle

Change-Id: I1d5982d0e1ceeaf8f6648b9c40c229b6a2b07ab8

---------

Co-authored-by: shentong.martin <shentong.martin@bytedance.com>
feat: tool search definition
…945)

- Add ToolAliases to prepareExecContext when building ToolsNodeConfig
- Add UnknownToolsHandler, ExecuteSequentially, ToolArgumentsHandler,
  and ToolAliases to applyBeforeAgent when rebuilding after BeforeAgent
  handlers modify tools
- Add tests covering argument alias remapping, name alias dispatch,
  alias preservation after handler rebuild, and handler-only tool
  registration with pre-configured aliases
hi-pender and others added 6 commits April 14, 2026 15:54
…esign

Merge the task management layer (previously adk/taskstate/) into
adk/middlewares/subagent/, since TaskMgr is inherently Agent-aware: it
owns execution (Run), resolves agents by name, and manages
foreground/background/auto-background switching. Keeping it in a
separate generic package added indirection without real reuse.

Key design decisions:

- TaskMgr is a concrete type, not an interface. The old Manager
  interface (Register+Handle closures) is replaced by a single
  Run(*RunInput) method that accepts an agent name and delegates
  internally. This eliminates the split-ownership problem where the
  caller had to wire up Handle.Complete/Fail callbacks manually.

- TaskMgr owns an agent registry (RegisterAgent) so it can resolve
  agents by name in Run(). This lays groundwork for future
  interrupt/resume support where TaskMgr needs to re-invoke agents.

- TaskStore abstraction is deferred. The interface was defined but
  never consumed — no code path reads from or writes to it. Rather
  than ship an unvalidated abstraction, we remove it and will
  introduce it when interrupt/resume or session persistence actually
  needs it. At that point the requirements will be concrete enough
  to design a correct interface on the first try.

- Removed recursion prevention (subagentCtxKey). The middleware is
  instantiated per Agent; whether sub-agents get their own sub-agent
  tools is the user's choice via their middleware config. The old
  context-marker guard prevented legitimate nesting.

Other changes:

- agentInput: added prompt field (task content) alongside description
  (task title), aligned with Claude Code's Agent tool schema. Backward
  compat: when prompt is empty, falls back to description to handle
  fewshot hallucination from the old single-field layout.

- Unified agentInput struct regardless of enableRunInBackground. When
  background is not enabled and the model hallucinates
  run_in_background=true, returns a system-reminder guiding re-invocation
  instead of silently running in foreground.

- Config field renames for clarity: AgentToolName→ToolName,
  CustomSystemPrompt→SystemPrompt (*string, since empty string is a
  valid intentional override), TaskToolDescriptionGenerator→ToolDescriptionGenerator.

- StatusCancelled→StatusCanceled (American English, consistent with
  Go stdlib context.Canceled).

- Task.IsBackground→Task.RunInBackground, RunInput.Background→RunInput.RunInBackground
  for naming consistency across the API surface.

- Chinese prompts: replaced "代理" (proxy) with "智能体" (intelligent
  agent) throughout — the standard term in Chinese AI/LLM context.

- Prompt refresh aligned with Claude Code reference: added "Writing
  the prompt" section teaching agents how to brief sub-agents
  effectively, added anti-polling guidance for background tasks,
  removed low-value "Subagent lifecycle" section.

- task_output_tool and task_stop_tool rewritten with utils.InferTool
  for automatic JSON schema inference, reducing boilerplate.

- Replaced encoding/json with sonic for marshal/unmarshal.

- Fixed cherry-pick oversight: deep.go now passes cfg.ModelFailoverConfig
  to ChatModelAgentConfig instead of hardcoded nil.

Deletes adk/taskstate/ entirely (manager.go, memory.go, memory_test.go).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ution

When auto-background is not enabled and the caller requests foreground
execution (the default path), run the agent inline instead of spawning
a goroutine and blocking on a channel. The goroutine+channel pattern
is only needed when we might not wait for the result (explicit
background or auto-background timeout).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
// When mgr is nil, agent runs are executed directly in the foreground with no tracking.
type agentTool struct {
name string
subAgents map[string]tool.InvokableTool // for non-TaskMgr foreground path
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent 转换成 tool,默认是invokeable tool,目前已经支持Enhancetool场景下,是不是应该支持下 Enhanced Tool的转换,可能subagent 执行最终返回的是多模态message ;

Comment thread adk/utils.go
// the Message itself or Chunks of the MessageStream, as they are not copied.
// NOTE: if you have CustomizedOutput or CustomizedAction, they are NOT copied.
func copyAgentEvent(ae *AgentEvent) *AgentEvent {
func CopyAgentEvent(ae *AgentEvent) *AgentEvent {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只是内部用的话,放到 adk/internal 里面?

Comment thread adk/utils.go
sts := ae.Output.MessageOutput.MessageStream.Copy(2)
mv.MessageStream = sts[0]
copied.Output.MessageOutput.MessageStream = sts[1]
setAutomaticClose(ae)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

放到 internal 的话,这个 setAutomaticClose 也可以不强制耦合 copyAgentEvent 了?

Comment thread adk/prebuilt/deep/deep.go
func buildSubAgentsList(ctx context.Context, cfg *Config, instruction string, handlers []adk.ChatModelAgentMiddleware) ([]adk.Agent, error) {
var allSubAgents []adk.Agent

if !cfg.WithoutGeneralSubAgent {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用户怎么扩展其他类型的 subAgent?

Comment thread adk/prebuilt/deep/deep.go
ToolsConfig: cfg.ToolsConfig,
MaxIterations: cfg.MaxIteration,
Middlewares: cfg.Middlewares,
Handlers: append(handlers, cfg.Handlers...),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subAgent 和 parent deepAgent 的 middleware 一定是完全一致的吗

return utils.InferTool(taskOutputToolName, desc, func(ctx context.Context, input taskOutputInput) (string, error) {
task, ok := mgr.Get(input.TaskID)
if !ok {
return fmt.Sprintf("Task %q not found", input.TaskID), nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个不会抛出 error 吗

// Result is the agent's output when Status is StatusCompleted.
Result string
// Error is the error message when Status is StatusFailed.
Error string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这也是 string,原因是相同的吗

// Using a struct instead of positional arguments for extensibility (e.g., future interrupt/resume fields).
type RunInput struct {
// SubagentType is the name of the registered agent to execute.
SubagentType string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 SubAgentType 现在是不是只有 General 一种

// AutoBackgroundMs sets the automatic foreground-to-background switching timeout.
// When > 0, a foreground agent run that hasn't completed within this many
// milliseconds will automatically switch to background mode.
// When 0, auto-background is disabled (foreground runs block indefinitely).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"0 的意思是 auto-background 禁用“,还是 nil 更好?

// It owns both the execution (Run) and the state tracking (Get/List/Cancel).
//
// TaskMgr is Agent-aware: it maintains a registry of agents (via RegisterAgent),
// resolves them by name in Run, and manages foreground/background/auto-background
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‘by name',这个 name 其实是固定的?比如如何并行 launch 多个 ’general‘ agent?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好像 in-background sub-agent 的 notification 如何影响 parent agent 执行还没实现?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

7 participants