Skip to content

fix(mcp): add timeout to background thread join in stop() to prevent hangs#1787

Closed
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/mcp-client-stop-join-timeout
Closed

fix(mcp): add timeout to background thread join in stop() to prevent hangs#1787
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/mcp-client-stop-join-timeout

Conversation

@giulio-leone
Copy link

Summary

Fixes #1732

When an Agent holding an MCPClient goes out of scope inside a function, the Agent.__del__ finalizer calls MCPClient.stop() which calls _background_thread.join(). If the background thread cannot exit promptly (e.g. transport subprocess teardown is slow), join() blocks indefinitely, causing the entire process to hang on exit.

Root Cause

_background_thread.join() at line 356 of mcp_client.py has no timeout. When called from the GC finalizer (Agent.__del__ToolRegistry.cleanup()MCPClient.remove_consumer()MCPClient.stop()), the transport subprocess may not shut down quickly, causing the join to block forever.

Fix

Add timeout=self._startup_timeout (default 30s) to the join() call. If the thread does not exit within the timeout, log a warning and continue cleanup. The thread is already a daemon thread (daemon=True), so it will be cleaned up by the interpreter on process exit.

Changes

src/strands/tools/mcp/mcp_client.py

  • Changed self._background_thread.join() to self._background_thread.join(timeout=self._startup_timeout)
  • Added is_alive() check after join to log a warning if the thread didn't exit

tests/strands/tools/mcp/test_mcp_client.py

  • Updated existing tests to expect join(timeout=...)
  • Added test_stop_does_not_hang_when_join_times_out — verifies cleanup proceeds even when thread is still alive after join timeout

Test Results

127 passed (all MCP tests), 0 failures

…hangs

When an Agent holding an MCPClient goes out of scope inside a function,
the Agent.__del__ finalizer calls MCPClient.stop() which calls
_background_thread.join(). If the background thread cannot exit promptly
(e.g. transport subprocess teardown is slow), join() blocks indefinitely,
causing the entire process to hang on exit.

Add a timeout (equal to startup_timeout, default 30s) to the join() call.
If the thread does not exit within the timeout, log a warning and proceed
with cleanup. The thread is already a daemon thread (daemon=True), so it
will be cleaned up by the interpreter on process exit.

Fixes strands-agents#1732

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giulio-leone
Copy link
Author

Closing — focusing on higher-impact contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Managed MCPClient integration hangs on exit

1 participant