docs: document LLM request and tool execution outcomes#341
Conversation
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
WalkthroughDocumentation now standardizes LLM request-intercept outcomes and tool-execution intercept outcomes, updates example code to use outcome objects, and adds new reference pages covering serialization, lifecycle, binding, and migration rules. ChangesIntercept Outcome Documentation
Estimated code review effort: 2 (Simple) | ~12 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
#### Overview
Finalize one canonical LLM request-intercept outcome across the Rust runtime, built-in and adaptive plugins, native ABI v1, `grpc-v1` workers, public C FFI, Go, Python, Node.js, and WebAssembly.
Request intercepts can rewrite the provider request, carry an optional normalized annotation, and schedule ordered marks for the managed LLM lifecycle:
```json
{
"request": {"headers": {}, "content": {}},
"annotated_request": null,
"pending_marks": []
}
```
`request` is required. `annotated_request` defaults to `null`, and `pending_marks` defaults to an empty list. Each pending mark contains only its name, optional category and category profile, data, and metadata; Relay continues to own event UUIDs, parent UUIDs, and timestamps.
The finalized contract also defines one provider-body source of truth. Without a request codec, `outcome.request.content` is authoritative. With a codec, `outcome.annotated_request` is required and authoritative, `outcome.request.content` is read-only context, and `outcome.request.headers` remains writable.
- [x] I confirm this contribution is my own work, or I have the right to submit it under this project's license.
- [x] I searched existing issues and open pull requests, and this does not duplicate existing work.
#### Why
Request intercepts run before Relay creates the managed LLM handle. A mark emitted directly from an intercept therefore cannot reliably attach to that future LLM scope. Returning pending mark specifications lets the lifecycle owner emit them at the correct boundary without leaking control data into provider requests, annotations, codecs, sanitizers, or execution intercepts.
Codec-aware interception also previously allowed two conflicting provider-body representations: an intercept could change both the raw request content and its normalized annotation, while Relay later encoded only the annotation. Making authority explicit prevents raw content edits from being silently discarded.
#### Details
- Make `LlmRequestInterceptOutcome` the only Rust callback result and keep one `register_llm_request_intercept` registration family for global, scope-local, plugin-context, and adaptive paths.
- Propagate each accepted request and annotation to the next intercept while appending pending marks in effective middleware order.
- Without a request codec, use `outcome.request.content` as the provider body.
- With a request codec, require `outcome.annotated_request`, encode the provider body from it, and allow header changes only through `outcome.request.headers`.
- Reject raw `request.content` mutations or missing annotations at the offending codec-path intercept, before later middleware, LLM lifecycle creation, mark emission, or provider invocation.
- Preserve marks from an intercept that breaks the chain; discard all accumulated marks if any intercept fails.
- Return the complete outcome from standalone request-intercept helpers. These helpers expose pending marks but do not emit them because they do not own an LLM lifecycle.
- After successful interception, create the LLM handle and capture one subscriber snapshot before emitting lifecycle events.
- Emit LLM start at `T`, every pending mark at `T + 1µs` in returned order with the LLM UUID as parent, and LLM end at or after `T + 1µs`.
- Apply the same behavior to streaming and non-streaming managed execution, including provider errors and stream finalization.
- Keep pending marks separate from provider-visible requests and annotations.
#### Boundary contracts
- **Native ABI v1:** return one host-owned outcome JSON string. Remove the private annotation-envelope transport and append required outcome-contract version fields to both host and plugin descriptor tables so stale binaries fail before callback invocation.
- **`grpc-v1`:** return one `JsonEnvelope` using schema `nemo.relay.LlmRequestInterceptOutcome@1`.
- **Public C FFI:** return one owned `char **out_outcome_json` and add `nemo_relay_llm_request_intercept_outcome_json_new`.
- **Go:** return `(LLMRequestInterceptOutcome, error)` and expose request, outcome, and pending-mark DTOs.
- **Python:** return `LLMRequestInterceptOutcome` and export `PendingMarkSpec`.
- **Node.js and WebAssembly:** return `{ request, annotated?, pendingMarks? }`. Binding-owned pending-mark DTOs use `categoryProfile`; canonical event and outcome JSON retains `category_profile`.
- **Rust native and worker SDKs:** expose only the canonical callback and registration method.
#### Breaking changes
This intentionally finalizes unpublished contracts in place:
- Rust and Python tuple results are removed.
- C and Go split outputs are removed.
- Mark-specific parallel registration variants are removed.
- The native annotation metadata envelope and fallback parser are removed.
- Native ABI host and plugin tables require the finalized outcome-contract field.
- The `grpc-v1` request-intercept result is replaced by the canonical outcome envelope.
- Codec-path intercepts must return an annotation and may no longer mutate raw `request.content`; malformed outcomes fail before lifecycle creation.
- Node.js and WebAssembly pending-mark objects use `categoryProfile` instead of the Rust/wire name `category_profile`.
All development native plugins and workers must rebuild against this version.
#### Where should the reviewer start?
1. `crates/types/src/api/event.rs` and `crates/types/src/api/llm.rs` for the canonical data contract.
2. `crates/core/src/api/runtime/state.rs`, `crates/core/src/api/shared.rs`, `crates/core/src/api/llm.rs`, and `crates/core/src/stream.rs` for chaining, codec authority, validation, and lifecycle behavior.
3. `crates/plugin/src/lib.rs`, `crates/core/src/plugin/dynamic/native.rs`, and `crates/core/src/plugin/dynamic/worker.rs` for native and worker boundaries.
4. `crates/ffi`, `go/nemo_relay`, `crates/python`, `crates/node`, and `crates/wasm` for binding contracts and DTO conversion.
5. `crates/core/tests/integration/middleware_tests.rs`, `crates/core/tests/integration/pipeline_tests.rs`, `crates/plugin/tests/typed_callbacks.rs`, and the binding tests for lifecycle, codec-authority, and boundary coverage.
The full contract, request-authority diagram, and migration notes are tracked in [companion documentation PR #341](#341), which should merge immediately after this PR.
#### Testing
- `cargo test --workspace --all-targets`
- `cargo clippy --workspace --all-targets -- -D warnings`
- `cargo fmt --all -- --check`
- Python codec and worker SDK coverage passes, including malformed codec-path outcomes and canonical worker envelopes.
- Node.js LLM suite: **38 passed**, including `categoryProfile` input/output conversion and codec-authority rejection.
- Go: all `go/nemo_relay/...` packages passed, including codec-authority coverage; `go vet ./...` passes.
- Native SDK: **52 passed**.
- Worker SDK: **9 passed**; worker protocol tests: **6 passed**.
- C FFI: unit and integration suites passed, including owned outcome allocation and malformed/null input coverage.
- WebAssembly native Rust tests: **13 passed**, including camelCase pending-mark DTO round trips and rejection of the wire-only `category_profile` spelling.
- Repository formatting, strict Clippy, Ruff, Prettier, type, lockfile, FFI-header, and applicable pre-commit checks pass.
`wasm-pack` and the `wasm32-unknown-unknown` Rust target were not available for the package-level Wasm suite. Environment-dependent socket and external-network tests were not used to validate these binding changes.
#### Related Issues
- Relates to #296
## Summary by CodeRabbit
* **New Features**
* LLM request intercepts can now return a unified outcome that includes the rewritten request, optional annotated request, and pending marks.
* Pending marks are now emitted alongside LLM lifecycle events and supported across SDKs and plugins.
* **Bug Fixes**
* Improved consistency of LLM event timing and parent/child relationships.
* Added stricter validation so intercepts that modify raw request content or omit required annotations are rejected when needed.
Authors:
- Bryan Bednarski (https://github.com/bbednarski9)
Approvers:
- Will Killian (https://github.com/willkill07)
URL: #327
| {/* SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| SPDX-License-Identifier: Apache-2.0 */} | ||
|
|
||
| Every LLM request intercept returns one canonical outcome: |
There was a problem hiding this comment.
Can you provide a description about what this does? Is this the LLM request intercept or the outcome?
|
|
||
| `request` is required. `annotated_request` defaults to `null` when omitted on | ||
| input, and `pending_marks` defaults to an empty list. Canonical serialization | ||
| includes all three fields. A pending mark contains only `name`, optional |
There was a problem hiding this comment.
| includes all three fields. A pending mark contains only `name`, optional | |
| includes all three fields. A pending mark only contains `name`, optional |
|
|
||
| ## Request Authority | ||
|
|
||
| The provider-body source of truth depends only on whether a request codec is |
There was a problem hiding this comment.
| The provider-body source of truth depends only on whether a request codec is | |
| The provider-body source of truth only depends on whether a request codec is |
| annotation, including its flattened `extra` fields for provider-specific data. | ||
| Relay rejects a changed raw body or missing annotation at the offending | ||
| intercept before invoking later middleware or creating an LLM lifecycle. | ||
|
|
There was a problem hiding this comment.
Should begin with "The following example describes/does xyz..."
| Python callbacks return `LLMRequestInterceptOutcome`; Rust callbacks return | ||
| `LlmRequestInterceptOutcome`; Go callbacks return | ||
| `LLMRequestInterceptOutcome`; and Node.js and WebAssembly callbacks return | ||
| `{ request, annotated?, pendingMarks? }`, with `categoryProfile` on each | ||
| JavaScript pending-mark DTO. The canonical JSON forms retain `pending_marks` | ||
| and `category_profile`. Public C callbacks write one owned canonical outcome | ||
| JSON string. Native ABI v1 uses one host-owned outcome JSON string. Rust and | ||
| Python `grpc-v1` worker SDKs return their canonical outcome type in a | ||
| `JsonEnvelope` whose schema is | ||
| `nemo.relay.LlmRequestInterceptOutcome@1`. |
There was a problem hiding this comment.
| Python callbacks return `LLMRequestInterceptOutcome`; Rust callbacks return | |
| `LlmRequestInterceptOutcome`; Go callbacks return | |
| `LLMRequestInterceptOutcome`; and Node.js and WebAssembly callbacks return | |
| `{ request, annotated?, pendingMarks? }`, with `categoryProfile` on each | |
| JavaScript pending-mark DTO. The canonical JSON forms retain `pending_marks` | |
| and `category_profile`. Public C callbacks write one owned canonical outcome | |
| JSON string. Native ABI v1 uses one host-owned outcome JSON string. Rust and | |
| Python `grpc-v1` worker SDKs return their canonical outcome type in a | |
| `JsonEnvelope` whose schema is | |
| `nemo.relay.LlmRequestInterceptOutcome@1`. | |
| The following are callbacks and what they return: | |
| - Python callbacks return `LLMRequestInterceptOutcome` | |
| - Rust callbacks return `LlmRequestInterceptOutcome` | |
| - Go callbacks return `LLMRequestInterceptOutcome` | |
| - Node.js and WebAssembly callbacks return`{ request, annotated?, pendingMarks? }`, with `categoryProfile` on each JavaScript pending-mark DTO. | |
| The canonical JSON forms retain `pending_marks` and `category_profile`. Public C callbacks write one owned canonical outcome JSON string. Native ABI v1 uses one host-owned outcome JSON string. Rust and Python `grpc-v1` worker SDKs return their canonical outcome type in a | |
| `JsonEnvelope` whose schema is `nemo.relay.LlmRequestInterceptOutcome@1`. |
| ## Managed Lifecycle | ||
|
|
||
| Managed execution runs all effective global and scope-local intercepts before | ||
| creating the LLM handle. Each accepted request/annotation pair feeds the next |
There was a problem hiding this comment.
| creating the LLM handle. Each accepted request/annotation pair feeds the next | |
| creating the LLM handle. Each accepted request or annotation pair feeds the next |
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Overview
Document the canonical LLM request-intercept and tool-execution intercept outcome contracts.
Details
next(args)behavior, pending marks, end-before-mark lifecycle ordering, migration, and binding contracts.Where should the reviewer start?
docs/reference/llm-request-intercept-outcomes.mdxdocs/reference/tool-execution-intercept-outcomes.mdxdocs/instrument-applications/advanced-guide.mdxRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Validation
git diff --checkSummary by CodeRabbit