3rd Party MCP - Forge Backend Tool Aggregation + Proxy

### Summary

The MCP endpoint (`POST /api/v1/mcp`) aggregates tools from two sources into a unified `tools/list` and routes `tools/call` to the right executor. 3rd party agents talk only to the Forge backend, never directly to the expert agent.

### Why

The MCP endpoint is the single entry point for 3rd party agents. It needs to present a unified tool list that combines platform tools (defined in code on the Forge backend) with the first-party agent's tools (flow/NR tools, frontend tools like context/navigation/routes). The agent never knows which backend handles what.

The first-party agent's tool list is fetched over MQTT rather than HTTP. MQTT gives better control for horizontal scaling and per-customer isolation later. The Forge backend sends an MQTT request to the expert agent asking for its tool list, the agent responds, and the Forge backend merges the result with platform tools.

For tool execution, the Forge backend routes based on tool source: platform tools are executed via `app.inject()`, agent tools are proxied to the expert agent via MQTT. PAT enforcement (#7559) runs at the Forge layer before any execution or proxying.

### What to do

**Tool aggregation (`tools/list`)**
* Fetch the expert agent's tool list via MQTT request/response
* Merge with static platform tool definitions (from #7430)
* Return a unified tool list to the 3rd party agent
* Handle expert agent unavailability gracefully (return platform tools only, or structured error depending on the failure mode)

**Tool routing (`tools/call`)**
* Determine which source owns the called tool
* Platform tool: execute via `app.inject()` with forwarded PAT (existing pattern from #7430)
* Agent tool: proxy to the expert agent via MQTT request/response, await result, return to caller
* PAT enforcement (#7559) runs before either path
* Timeout handling for agent tool calls (expert agent or browser unresponsive)

**MQTT communication with expert agent**
* Publishes tool calls on `ff/v1/mcp/<platformId>/<userId>/<sessionId>/<entityType>/<entityId>/req` with correlation ID and tool details in payload
* Subscribes to `ff/v1/mcp/<platformId>/<userId>/<sessionId>/<entityType>/<entityId>/res` per active user/session/entity combination (not a broad wildcard)
* Matches responses via correlation ID in payload as a safety net, but MQTT topic-level filtering is the primary routing mechanism
* Uses the existing long-lived platform MQTT client, no new connections
* ACL rules for `ff/v1/mcp/` topics: validate forge_platform client only, no user/team/RBAC checks

**Subscription caching**
* Per-context response subscriptions (one per active user/session/entity combination) are cached in an in-process LRU cache (`lru-cache` directly, not the global `app.caches` abstraction, since subscription state is per-process and bound to the local MQTT client connection)
* ~24h TTL, extended on reuse. Each subsequent tool call for the same user/session/entity prolongs the subscription's life.
* On eviction (TTL expiry or LRU overflow), the `dispose` callback triggers an MQTT unsubscribe, so stale subscriptions don't accumulate on the platform client
* ACL check happens per new subscription, but it's cheap (client validation only, no DB queries)
* A broad wildcard subscription was considered and rejected: it creates message noise (Forge receives responses for all users, discards non-matching ones), weak isolation between users, and relies entirely on correlation IDs for routing rather than MQTT-level filtering
* This is separate from the tool annotation cache in #7559 (which caches tool metadata like readOnlyHint). Both use `lru-cache`, both are per-process, but they serve different purposes.

### Tests

* `tools/list` returns the union of platform tools and expert agent tools
* Calling a platform tool executes via `app.inject()` and returns the result
* Calling an agent tool proxies to the expert agent and returns the result
* PAT enforcement: read-only PAT cannot call write-annotated tools from either source
* PAT enforcement: team-scoped PAT restrictions apply to agent tools
* Expert agent unavailable: structured error or graceful degradation (platform tools still work)
* Timeout: unresponsive expert agent produces a structured error
* Tool source routing: tools are dispatched to the correct executor
* Subscription is reused for subsequent agent tool calls to the same user/session/entity (no re-subscribe)
* Subscription TTL is extended on reuse
* Expired subscriptions are unsubscribed (dispose callback fires)

### References

* Platform tool definitions: #7430
* PAT enforcement: #7559
* Expert agent MQTT tool list + execution: #7571
* Existing platform MQTT client: `forge/comms/commsClient.js`
* `sendCommandAwaitReply` pattern: `forge/comms/devices.js`
* MCP topic structure: #7424 (Key architectural decisions)
* Parent story: #7424

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3rd Party MCP - Forge Backend Tool Aggregation + Proxy #7560

Summary

Why

What to do

Tests

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

3rd Party MCP - Forge Backend Tool Aggregation + Proxy #7560

Description

Summary

Why

What to do

Tests

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions