feat(plugin)!: support pending marks from tool execution intercepts#350
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a canonical tool execution intercept outcome carrying pending marks, then threads that typed shape through runtime execution, plugin bridges, and language bindings with matching tests and fixtures. ChangesTool Execution Intercept Outcome Rollout
Estimated code review effort: 5 (Critical) | ~120 minutes Sequence Diagram(s)sequenceDiagram
participant Intercept
participant Runtime
participant ToolCall
participant Bridge
participant Tests
Intercept->>Runtime: return ToolExecutionInterceptOutcome
Runtime->>ToolCall: emit tool-end and pending marks
Bridge->>Intercept: validate/serialize typed outcome
Tests->>Bridge: assert wrapped result + pending_marks
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
127a806 to
b9dc629
Compare
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
License DiffCompared against Lockfile license changesLockfile License ChangesRustAdded
Removed
Updated/Changed
NodeAdded
Removed
Updated/Changed
PythonAdded
Removed
Updated/Changed
Status output |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
go/nemo_relay/tools_test.go (1)
442-495: 📐 Maintainability & Code Quality | 🔵 Trivial | 🏗️ Heavy liftNo Go test exercises
PendingMarkspropagation.Every updated callback here (and across
callbacks_test.go,deregister_test.go,error_test.go,scope_local_test.go,intercepts_test.go,adaptive_plugin_test.go) only setsResultand leavesPendingMarksnil/empty. None of the Go tests construct a non-emptyPendingMarksslice or assert that marks emitted from a tool execution intercept reach the caller/lifecycle subscriber — despite pending-mark support being the headline feature added by this PR. This full pipeline test (TestToolFullPipelineInterceptsAndExecute) is a natural place to extend coverage with aPendingMarkSpecand an assertion on its emission/ordering.As per path instructions, "Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant," and any Go API change "should include focused Go tests."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/nemo_relay/tools_test.go` around lines 442 - 495, Extend TestToolFullPipelineInterceptsAndExecute to cover PendingMarks propagation through the full ToolCallExecute pipeline. In the execution intercept callback and/or the tool callable path, return a non-empty PendingMarks slice using the existing ToolExecutionInterceptOutcome and PendingMarkSpec types, then assert the caller receives those marks in the expected order alongside the Result. Use the existing symbols ToolCallExecute, RegisterToolExecutionIntercept, and PendingMarkSpec to locate the flow, and mirror the same coverage pattern in the related callback/intercept tests if needed.Source: Path instructions
crates/worker/tests/worker_sdk_tests.rs (1)
508-519: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick winStrengthen pending-mark round-trip coverage at the worker RPC boundary.
The pending mark built here only sets
name, and the test only assertspending_marks[0].name. Per PR objectives, marks should retaincategory,category profile,data, andmetadataacross the boundary — this is the only test exercisingPendingMarkSpecthrough the newJsonEnvelope/gRPC serialization path, so a regression dropping those fields on encode/decode wouldn't be caught here.As per path instructions,
{crates/**/tests/**,...}: "Tests should cover the behavior promised by the changed API surface... Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests."♻️ Proposed test strengthening
ctx.register_tool_execution_intercept("tool-exec", 1, move |_, value, next: ToolNext| { let runtime = tool_runtime.clone(); async move { ... Ok(ToolExecutionInterceptOutcome::new(set_json_field( next_value, "phase", "tool_exec", )) .with_pending_mark( PendingMarkSpec::builder() .name("worker.tool.execution") + .category("custom") + .data(json!({"count": 1})) + .metadata(json!({"source": "worker"})) .build(), )) } });let tool_outcome = invoke_tool_execution(...).await; assert_eq!(tool_outcome.pending_marks.len(), 1); - assert_eq!(tool_outcome.pending_marks[0].name, "worker.tool.execution"); + let mark = &tool_outcome.pending_marks[0]; + assert_eq!(mark.name, "worker.tool.execution"); + assert_eq!(mark.category.as_deref(), Some("custom")); + assert_eq!(mark.data, Some(json!({"count": 1}))); + assert_eq!(mark.metadata, Some(json!({"source": "worker"})));Also applies to: 1546-1555
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/worker/tests/worker_sdk_tests.rs` around lines 508 - 519, The PendingMarkSpec round-trip test in worker_sdk_tests should verify all fields survive the worker RPC boundary, not just name. Update the tool invocation/assertions around invoke_tool_execution, tool_outcome, and tool_exec to populate and compare category, category profile, data, and metadata alongside pending_marks[0].name. Strengthen the test so it exercises the JsonEnvelope/gRPC serialization path and fails if any PendingMarkSpec field is dropped or altered during encode/decode.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/ffi/src/callable.rs`:
- Around line 390-392: The ownership/lifetime handling in
callable::json_result_from_ptr usage is unsafe because an invalid non-null
callback string can return early before result_ptr is freed. Update the tool
execution intercept callback flow in callable.rs so result_ptr is always
released before any parse error is propagated, ideally by ensuring the free
happens on the error path or via a scope/guard around the json_result_from_ptr
call. Keep the fix localized to the callback handling around
json_result_from_ptr and nemo_relay_string_free_internal.
In `@crates/ffi/tests/unit/callable_tests.rs`:
- Around line 292-294: The current test only verifies the empty pending-marks
path, so add another callback case around wrap_tool_exec_intercept_fn that
returns a populated pending_marks entry and assert the wrapped result preserves
its name, category, profile, data, and metadata. Update the callable_tests
coverage to exercise the FFI wrapper’s promised API behavior by checking both
the intercepted payload and a non-empty pending_marks array.
In `@crates/node/plugin.d.ts`:
- Around line 48-61: The inline pendingMarks shape in
registerLlmRequestIntercept duplicates the new PendingMarkSpec interface, so
update the registerLlmRequestIntercept callback/options types to reference
PendingMarkSpec[] instead of redefining the object shape. Keep the
PendingMarkSpec interface as the single source of truth and reuse it wherever
the pending mark array is typed, including the ToolExecutionInterceptOutcome
contract if applicable.
In `@crates/node/tests/tools_tests.mjs`:
- Around line 630-670: The Node tool execution test in toolCallExecute should
also verify event ordering, not just parentage. In the existing
replaced_tool/intercept flow, capture the tool end event alongside the pending
mark and assert the pending mark is observed after the end event, matching the
contract already covered by
test_tool_execution_outcome_marks_follow_end_with_tool_parentage. Use the
existing event collection in tools_tests.mjs and the current start/mark
assertions to add an end-before-mark ordering check.
In `@crates/python/src/py_callable.rs`:
- Around line 440-447: The same PyToolExecutionInterceptOutcome extraction and
FlowError::Internal mapping is duplicated in both the sync and async branches of
py_callable.rs. Extract that repeated logic into a small helper such as
extract_tool_outcome and have both the direct result path and the awaited
coroutine path call it, so the error message and extraction behavior stay
consistent in the tool interception flow.
---
Outside diff comments:
In `@crates/worker/tests/worker_sdk_tests.rs`:
- Around line 508-519: The PendingMarkSpec round-trip test in worker_sdk_tests
should verify all fields survive the worker RPC boundary, not just name. Update
the tool invocation/assertions around invoke_tool_execution, tool_outcome, and
tool_exec to populate and compare category, category profile, data, and metadata
alongside pending_marks[0].name. Strengthen the test so it exercises the
JsonEnvelope/gRPC serialization path and fails if any PendingMarkSpec field is
dropped or altered during encode/decode.
In `@go/nemo_relay/tools_test.go`:
- Around line 442-495: Extend TestToolFullPipelineInterceptsAndExecute to cover
PendingMarks propagation through the full ToolCallExecute pipeline. In the
execution intercept callback and/or the tool callable path, return a non-empty
PendingMarks slice using the existing ToolExecutionInterceptOutcome and
PendingMarkSpec types, then assert the caller receives those marks in the
expected order alongside the Result. Use the existing symbols ToolCallExecute,
RegisterToolExecutionIntercept, and PendingMarkSpec to locate the flow, and
mirror the same coverage pattern in the related callback/intercept tests if
needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: d012fd55-3115-46cf-9ead-c919f10ae59f
📒 Files selected for processing (62)
crates/adaptive/src/intercepts.rscrates/adaptive/tests/unit/intercepts_tests.rscrates/adaptive/tests/unit/runtime_features_tests.rscrates/core/src/api/runtime/callbacks.rscrates/core/src/api/runtime/state.rscrates/core/src/api/tool.rscrates/core/src/plugin/dynamic/native.rscrates/core/src/plugin/dynamic/worker.rscrates/core/src/plugins/nemo_guardrails/python.rscrates/core/src/plugins/nemo_guardrails/remote.rscrates/core/tests/fixtures/native_plugin/src/lib.rscrates/core/tests/fixtures/worker_plugin/src/main.rscrates/core/tests/integration/api_surface_tests.rscrates/core/tests/integration/middleware_tests.rscrates/core/tests/integration/native_plugin_tests.rscrates/core/tests/integration/worker_plugin_tests.rscrates/core/tests/unit/plugin_tests.rscrates/ffi/nemo_relay.hcrates/ffi/src/callable.rscrates/ffi/tests/integration/api_tests.rscrates/ffi/tests/unit/api/registry_tests.rscrates/ffi/tests/unit/api_tests.rscrates/ffi/tests/unit/callable_tests.rscrates/node/plugin.d.tscrates/node/src/api/mod.rscrates/node/src/callable.rscrates/node/tests/scope_local_tests.mjscrates/node/tests/tools_tests.mjscrates/plugin/src/lib.rscrates/plugin/tests/typed_callbacks.rscrates/python/src/py_callable.rscrates/python/src/py_types/core.rscrates/python/src/py_types/mod.rscrates/python/tests/coverage/coverage_tests.rscrates/python/tests/coverage/py_api_coverage_tests.rscrates/python/tests/coverage/py_callable_coverage_tests.rscrates/python/tests/coverage/py_plugin_coverage_tests.rscrates/types/src/api/event.rscrates/types/src/api/tool.rscrates/types/tests/serialization_tests.rscrates/worker-proto/proto/nemo/relay/worker/v1/plugin_worker.protocrates/worker/src/lib.rscrates/worker/tests/worker_sdk_tests.rsexamples/rust-native-plugin/src/lib.rsgo/nemo_relay/adaptive_plugin_test.gogo/nemo_relay/callbacks.gogo/nemo_relay/callbacks_test.gogo/nemo_relay/deregister_test.gogo/nemo_relay/error_test.gogo/nemo_relay/intercepts/intercepts_test.gogo/nemo_relay/scope_local_test.gogo/nemo_relay/tools_test.gopython/nemo_relay/__init__.pypython/nemo_relay/__init__.pyipython/nemo_relay/_native.pyipython/nemo_relay/intercepts.pypython/nemo_relay/scope_local.pypython/plugin/src/nemo_relay_plugin/__init__.pypython/plugin/src/nemo_relay_plugin/_api.pypython/tests/plugin/test_worker_sdk.pypython/tests/test_scope_local.pypython/tests/test_tools.py
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
|
/merge |
#### Overview Document the canonical LLM request-intercept and tool-execution intercept outcome contracts. - [x] I confirm this contribution is my own work, or I have the right to submit it under this project's license. - [x] I searched existing issues and open pull requests, and this does not duplicate existing work. #### Details - Resolve reviewer feedback in the LLM request-intercept outcome reference with clearer purpose, lifecycle, diagram, and binding-contract explanations. - Add a parallel tool-execution outcome reference covering raw `next(args)` behavior, pending marks, end-before-mark lifecycle ordering, migration, and binding contracts. - Update the Python, Node.js, and Rust tool execution middleware examples to return the canonical outcome. - Keep this PR documentation-only; it contains no runtime changes. #### Where should the reviewer start? - `docs/reference/llm-request-intercept-outcomes.mdx` - `docs/reference/tool-execution-intercept-outcomes.mdx` - `docs/instrument-applications/advanced-guide.mdx` #### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to) - Relates to #327 - Relates to #350 #### Validation - Generated Python, Node.js, and Rust API reference pages. - Fern check completed with zero errors; redirects validation was skipped because no Fern token is configured. - Fern strict broken-link validation passed. - `git diff --check` ## Summary by CodeRabbit * **Documentation** * Added a new reference page defining the canonical format and lifecycle rules for request-intercept outcomes (including codec-aware validation and migration guidance). * Added a new reference page defining the canonical format and lifecycle rules for tool-execution intercept outcomes (including pending-mark handling and discard behavior). * **Documentation** * Updated request-intercept and middleware/tool-policy examples across Python, Rust, and Node.js to return structured outcome objects and to derive rewritten requests from captured outcomes. * Clarified codec-aware request authority and content/annotation responsibilities in the rewritten provider request flow. Authors: - Bryan Bednarski (https://github.com/bbednarski9) Approvers: - https://github.com/lvojtku - Will Killian (https://github.com/willkill07) - Maryam Najafian (https://github.com/mnajafian-nv) URL: #341
Summary
ToolExecutionInterceptOutcomethe canonical return type for every registered tool execution interceptnext(args)continuation as raw JSON while Relay retains downstream pending marksgrpc-v1workers, C, Go, Python, and NodeMotivation
Tool execution intercepts only learn the final result after execution, while Relay owns the managed tool lifecycle and its UUID. Returning a canonical outcome keeps plugin control data separate from the application-visible tool result and lets Relay materialize marks with correct parentage.
This follows the outcome-return convention already used by LLM request intercepts instead of introducing a second registration family or an implementation-specific callback variant.
Behavior
outcome.resultnext(...)calls preserve invocation order rather than completion orderBoundary contracts
ToolExecutionInterceptOutcomegrpc-v1workers return aToolExecutionInterceptResultcontaining the exactnemo.relay.ToolExecutionInterceptOutcome@1enveloperesultand optionalpending_marksToolExecutionInterceptOutcomeToolExecutionInterceptOutcome{ result, pendingMarks? }Breaking changes
next(args)result are unchangedmainValidation
cargo test --workspace --all-targets -- --test-threads=1cargo check --workspace --all-targetscargo clippy --workspace --all-targets -- -D warningscargo fmt --all -- --checkgo test ./...andgo vet ./...git diff --checkDocumentation
No documentation changes are included. Follow-up documentation remains in #341.
Summary by CodeRabbit
resultplus optionalpendingMarks, with consistent support across Rust, Node, Python, Go, and the worker SDK.pendingMarkstiming semantics.