docs: document LLM request and tool execution outcomes by bbednarski9 · Pull Request #341 · NVIDIA/NeMo-Relay

bbednarski9 · 2026-07-01T13:59:06Z

Overview

Document the canonical LLM request-intercept and tool-execution intercept outcome contracts.

I confirm this contribution is my own work, or I have the right to submit it under this project's license.
I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

Resolve reviewer feedback in the LLM request-intercept outcome reference with clearer purpose, lifecycle, diagram, and binding-contract explanations.
Add a parallel tool-execution outcome reference covering raw next(args) behavior, pending marks, end-before-mark lifecycle ordering, migration, and binding contracts.
Update the Python, Node.js, and Rust tool execution middleware examples to return the canonical outcome.
Keep this PR documentation-only; it contains no runtime changes.

Where should the reviewer start?

docs/reference/llm-request-intercept-outcomes.mdx
docs/reference/tool-execution-intercept-outcomes.mdx
docs/instrument-applications/advanced-guide.mdx

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to feat(plugin)!: support pending marks from LLM intercepts #327
Relates to feat(plugin)!: support pending marks from tool execution intercepts #350

Validation

Generated Python, Node.js, and Rust API reference pages.
Fern check completed with zero errors; redirects validation was skipped because no Fern token is configured.
Fern strict broken-link validation passed.
git diff --check

Summary by CodeRabbit

Documentation
- Added a new reference page defining the canonical format and lifecycle rules for request-intercept outcomes.
- Added a new reference page defining the canonical format and lifecycle rules for tool-execution intercept outcomes.
- Updated request-intercept and middleware/tool-policy examples across Python, Rust, and Node.js to use outcome objects and the revised return shapes.
- Clarified codec-aware behavior, validation/error handling, and how rewritten requests/annotations must be derived in managed execution.

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai · 2026-07-01T13:59:14Z

Walkthrough

Documentation now standardizes LLM request-intercept outcomes and tool-execution intercept outcomes, updates example code to use outcome objects, and adds new reference pages covering serialization, lifecycle, binding, and migration rules.

Changes

Intercept Outcome Documentation

Layer / File(s)	Summary
Canonical LLM request-intercept reference `docs/reference/llm-request-intercept-outcomes.mdx`	Defines the canonical outcome fields, request authority rules, intercept resolution flow, language/ABI mappings, lifecycle timing, and migration notes for LLM request intercepts.
Canonical tool-intercept reference `docs/reference/tool-execution-intercept-outcomes.mdx`	Defines the canonical tool execution outcome shape, continuation semantics, lifecycle event ordering, binding mappings, and migration guidance.
Build-plugins examples updated to outcome return type `docs/build-plugins/code-examples.mdx`, `docs/build-plugins/register-behavior.mdx`	Python and Rust `add_header` intercept examples now return `LLMRequestInterceptOutcome` or `LlmRequestInterceptOutcome` objects instead of tuples.
Consumer-side outcome usage `docs/instrument-applications/code-examples.mdx`, `docs/integrate-into-frameworks/code-examples.mdx`	Python, Node.js, and Rust examples now capture intercept results as `outcome` and read `outcome.request` before conditional execution or downstream use.
Provider-codecs workflow and example updates `docs/integrate-into-frameworks/provider-codecs.mdx`	Documents request-codec request authority and rejection behavior, and updates the Python example to return `LLMRequestInterceptOutcome` objects.

Estimated code review effort: 2 (Simple) | ~12 minutes

Possibly related PRs

NVIDIA/NeMo-Relay#327: Covers the same intercept outcome contract shift, but through runtime/FFI and payload-shape changes rather than documentation updates.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title follows Conventional Commits and accurately summarizes the documentation-only changes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description matches the required template sections and includes overview, details, reviewer start points, related issues, and checklist items.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-07-01T14:03:37Z

Fern docs preview: https://nvidia-preview-pull-request-341.docs.buildwithfern.com/nemo/relay (https://nvidia-preview-pull-request-341.docs.buildwithfern.com/nemo/relay)

#### Overview Finalize one canonical LLM request-intercept outcome across the Rust runtime, built-in and adaptive plugins, native ABI v1, `grpc-v1` workers, public C FFI, Go, Python, Node.js, and WebAssembly. Request intercepts can rewrite the provider request, carry an optional normalized annotation, and schedule ordered marks for the managed LLM lifecycle: ```json { "request": {"headers": {}, "content": {}}, "annotated_request": null, "pending_marks": [] } ``` `request` is required. `annotated_request` defaults to `null`, and `pending_marks` defaults to an empty list. Each pending mark contains only its name, optional category and category profile, data, and metadata; Relay continues to own event UUIDs, parent UUIDs, and timestamps. The finalized contract also defines one provider-body source of truth. Without a request codec, `outcome.request.content` is authoritative. With a codec, `outcome.annotated_request` is required and authoritative, `outcome.request.content` is read-only context, and `outcome.request.headers` remains writable. - [x] I confirm this contribution is my own work, or I have the right to submit it under this project's license. - [x] I searched existing issues and open pull requests, and this does not duplicate existing work. #### Why Request intercepts run before Relay creates the managed LLM handle. A mark emitted directly from an intercept therefore cannot reliably attach to that future LLM scope. Returning pending mark specifications lets the lifecycle owner emit them at the correct boundary without leaking control data into provider requests, annotations, codecs, sanitizers, or execution intercepts. Codec-aware interception also previously allowed two conflicting provider-body representations: an intercept could change both the raw request content and its normalized annotation, while Relay later encoded only the annotation. Making authority explicit prevents raw content edits from being silently discarded. #### Details - Make `LlmRequestInterceptOutcome` the only Rust callback result and keep one `register_llm_request_intercept` registration family for global, scope-local, plugin-context, and adaptive paths. - Propagate each accepted request and annotation to the next intercept while appending pending marks in effective middleware order. - Without a request codec, use `outcome.request.content` as the provider body. - With a request codec, require `outcome.annotated_request`, encode the provider body from it, and allow header changes only through `outcome.request.headers`. - Reject raw `request.content` mutations or missing annotations at the offending codec-path intercept, before later middleware, LLM lifecycle creation, mark emission, or provider invocation. - Preserve marks from an intercept that breaks the chain; discard all accumulated marks if any intercept fails. - Return the complete outcome from standalone request-intercept helpers. These helpers expose pending marks but do not emit them because they do not own an LLM lifecycle. - After successful interception, create the LLM handle and capture one subscriber snapshot before emitting lifecycle events. - Emit LLM start at `T`, every pending mark at `T + 1µs` in returned order with the LLM UUID as parent, and LLM end at or after `T + 1µs`. - Apply the same behavior to streaming and non-streaming managed execution, including provider errors and stream finalization. - Keep pending marks separate from provider-visible requests and annotations. #### Boundary contracts - **Native ABI v1:** return one host-owned outcome JSON string. Remove the private annotation-envelope transport and append required outcome-contract version fields to both host and plugin descriptor tables so stale binaries fail before callback invocation. - **`grpc-v1`:** return one `JsonEnvelope` using schema `nemo.relay.LlmRequestInterceptOutcome@1`. - **Public C FFI:** return one owned `char **out_outcome_json` and add `nemo_relay_llm_request_intercept_outcome_json_new`. - **Go:** return `(LLMRequestInterceptOutcome, error)` and expose request, outcome, and pending-mark DTOs. - **Python:** return `LLMRequestInterceptOutcome` and export `PendingMarkSpec`. - **Node.js and WebAssembly:** return `{ request, annotated?, pendingMarks? }`. Binding-owned pending-mark DTOs use `categoryProfile`; canonical event and outcome JSON retains `category_profile`. - **Rust native and worker SDKs:** expose only the canonical callback and registration method. #### Breaking changes This intentionally finalizes unpublished contracts in place: - Rust and Python tuple results are removed. - C and Go split outputs are removed. - Mark-specific parallel registration variants are removed. - The native annotation metadata envelope and fallback parser are removed. - Native ABI host and plugin tables require the finalized outcome-contract field. - The `grpc-v1` request-intercept result is replaced by the canonical outcome envelope. - Codec-path intercepts must return an annotation and may no longer mutate raw `request.content`; malformed outcomes fail before lifecycle creation. - Node.js and WebAssembly pending-mark objects use `categoryProfile` instead of the Rust/wire name `category_profile`. All development native plugins and workers must rebuild against this version. #### Where should the reviewer start? 1. `crates/types/src/api/event.rs` and `crates/types/src/api/llm.rs` for the canonical data contract. 2. `crates/core/src/api/runtime/state.rs`, `crates/core/src/api/shared.rs`, `crates/core/src/api/llm.rs`, and `crates/core/src/stream.rs` for chaining, codec authority, validation, and lifecycle behavior. 3. `crates/plugin/src/lib.rs`, `crates/core/src/plugin/dynamic/native.rs`, and `crates/core/src/plugin/dynamic/worker.rs` for native and worker boundaries. 4. `crates/ffi`, `go/nemo_relay`, `crates/python`, `crates/node`, and `crates/wasm` for binding contracts and DTO conversion. 5. `crates/core/tests/integration/middleware_tests.rs`, `crates/core/tests/integration/pipeline_tests.rs`, `crates/plugin/tests/typed_callbacks.rs`, and the binding tests for lifecycle, codec-authority, and boundary coverage. The full contract, request-authority diagram, and migration notes are tracked in [companion documentation PR #341](#341), which should merge immediately after this PR. #### Testing - `cargo test --workspace --all-targets` - `cargo clippy --workspace --all-targets -- -D warnings` - `cargo fmt --all -- --check` - Python codec and worker SDK coverage passes, including malformed codec-path outcomes and canonical worker envelopes. - Node.js LLM suite: **38 passed**, including `categoryProfile` input/output conversion and codec-authority rejection. - Go: all `go/nemo_relay/...` packages passed, including codec-authority coverage; `go vet ./...` passes. - Native SDK: **52 passed**. - Worker SDK: **9 passed**; worker protocol tests: **6 passed**. - C FFI: unit and integration suites passed, including owned outcome allocation and malformed/null input coverage. - WebAssembly native Rust tests: **13 passed**, including camelCase pending-mark DTO round trips and rejection of the wire-only `category_profile` spelling. - Repository formatting, strict Clippy, Ruff, Prettier, type, lockfile, FFI-header, and applicable pre-commit checks pass. `wasm-pack` and the `wasm32-unknown-unknown` Rust target were not available for the package-level Wasm suite. Environment-dependent socket and external-network tests were not used to validate these binding changes. #### Related Issues - Relates to #296 ## Summary by CodeRabbit * **New Features** * LLM request intercepts can now return a unified outcome that includes the rewritten request, optional annotated request, and pending marks. * Pending marks are now emitted alongside LLM lifecycle events and supported across SDKs and plugins. * **Bug Fixes** * Improved consistency of LLM event timing and parent/child relationships. * Added stricter validation so intercepts that modify raw request content or omit required annotations are rejected when needed. Authors: - Bryan Bednarski (https://github.com/bbednarski9) Approvers: - Will Killian (https://github.com/willkill07) URL: #327

lvojtku · 2026-07-01T20:17:25Z

+{/* SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0 */}
+
+Every LLM request intercept returns one canonical outcome:


Can you provide a description about what this does? Is this the LLM request intercept or the outcome?

lvojtku · 2026-07-01T20:17:56Z

+
+`request` is required. `annotated_request` defaults to `null` when omitted on
+input, and `pending_marks` defaults to an empty list. Canonical serialization
+includes all three fields. A pending mark contains only `name`, optional


Suggested change

includes all three fields. A pending mark contains only `name`, optional

includes all three fields. A pending mark only contains `name`, optional

lvojtku · 2026-07-01T20:18:14Z

+
+## Request Authority
+
+The provider-body source of truth depends only on whether a request codec is


Suggested change

The provider-body source of truth depends only on whether a request codec is

The provider-body source of truth only depends on whether a request codec is

lvojtku · 2026-07-01T20:18:53Z

+annotation, including its flattened `extra` fields for provider-specific data.
+Relay rejects a changed raw body or missing annotation at the offending
+intercept before invoking later middleware or creating an LLM lifecycle.
+


What does the following snippet do?

Should begin with "The following example describes/does xyz..."

lvojtku · 2026-07-01T20:22:04Z

+Python callbacks return `LLMRequestInterceptOutcome`; Rust callbacks return
+`LlmRequestInterceptOutcome`; Go callbacks return
+`LLMRequestInterceptOutcome`; and Node.js and WebAssembly callbacks return
+`{ request, annotated?, pendingMarks? }`, with `categoryProfile` on each
+JavaScript pending-mark DTO. The canonical JSON forms retain `pending_marks`
+and `category_profile`. Public C callbacks write one owned canonical outcome
+JSON string. Native ABI v1 uses one host-owned outcome JSON string. Rust and
+Python `grpc-v1` worker SDKs return their canonical outcome type in a
+`JsonEnvelope` whose schema is
+`nemo.relay.LlmRequestInterceptOutcome@1`.


Suggested change

Python callbacks return `LLMRequestInterceptOutcome`; Rust callbacks return

`LlmRequestInterceptOutcome`; Go callbacks return

`LLMRequestInterceptOutcome`; and Node.js and WebAssembly callbacks return

`{ request, annotated?, pendingMarks? }`, with `categoryProfile` on each

JavaScript pending-mark DTO. The canonical JSON forms retain `pending_marks`

and `category_profile`. Public C callbacks write one owned canonical outcome

JSON string. Native ABI v1 uses one host-owned outcome JSON string. Rust and

Python `grpc-v1` worker SDKs return their canonical outcome type in a

`JsonEnvelope` whose schema is

`nemo.relay.LlmRequestInterceptOutcome@1`.

The following are callbacks and what they return:

- Python callbacks return `LLMRequestInterceptOutcome`

- Rust callbacks return `LlmRequestInterceptOutcome`

- Go callbacks return `LLMRequestInterceptOutcome`

- Node.js and WebAssembly callbacks return`{ request, annotated?, pendingMarks? }`, with `categoryProfile` on each JavaScript pending-mark DTO.

The canonical JSON forms retain `pending_marks` and `category_profile`. Public C callbacks write one owned canonical outcome JSON string. Native ABI v1 uses one host-owned outcome JSON string. Rust and Python `grpc-v1` worker SDKs return their canonical outcome type in a

`JsonEnvelope` whose schema is `nemo.relay.LlmRequestInterceptOutcome@1`.

lvojtku · 2026-07-01T20:22:26Z

+## Managed Lifecycle
+
+Managed execution runs all effective global and scope-local intercepts before
+creating the LLM handle. Each accepted request/annotation pair feeds the next


Suggested change

creating the LLM handle. Each accepted request/annotation pair feeds the next

creating the LLM handle. Each accepted request or annotation pair feeds the next

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

docs: document LLM request intercept outcomes

279e53e

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

github-actions Bot added size:M PR is medium Documentation documentation-related labels Jul 1, 2026

copy-pr-bot Bot temporarily deployed to fern July 1, 2026 13:59 Inactive

bbednarski9 mentioned this pull request Jul 1, 2026

feat(plugin)!: support pending marks from LLM intercepts #327

Merged

2 tasks

willkill07 added this to the 0.5 milestone Jul 1, 2026

willkill07 assigned bbednarski9 Jul 1, 2026

bbednarski9 marked this pull request as ready for review July 1, 2026 17:59

bbednarski9 requested a review from lvojtku as a code owner July 1, 2026 17:59

coderabbitai Bot added the DO NOT MERGE PR should not be merged; see PR for details label Jul 1, 2026

lvojtku reviewed Jul 1, 2026

View reviewed changes

willkill07 removed the DO NOT MERGE PR should not be merged; see PR for details label Jul 1, 2026

bbednarski9 mentioned this pull request Jul 2, 2026

feat(plugin)!: support pending marks from tool execution intercepts #350

Open

docs: document tool execution intercept outcomes

93f37b1

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 changed the title ~~docs: document LLM request intercept outcomes~~ docs: document LLM request and tool execution outcomes Jul 2, 2026

copy-pr-bot Bot deployed to fern July 2, 2026 04:43 Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: document LLM request and tool execution outcomes#341

docs: document LLM request and tool execution outcomes#341
bbednarski9 wants to merge 2 commits into
NVIDIA:mainfrom
bbednarski9:docs/llm-intercept-pending-marks

bbednarski9 commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

lvojtku Jul 1, 2026

Uh oh!

lvojtku Jul 1, 2026

Uh oh!

lvojtku Jul 1, 2026

Uh oh!

lvojtku Jul 1, 2026

Uh oh!

lvojtku Jul 1, 2026

Uh oh!

lvojtku Jul 1, 2026

Uh oh!

lvojtku Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	includes all three fields. A pending mark contains only `name`, optional
	includes all three fields. A pending mark only contains `name`, optional


		## Request Authority

		The provider-body source of truth depends only on whether a request codec is

	The provider-body source of truth depends only on whether a request codec is
	The provider-body source of truth only depends on whether a request codec is

	creating the LLM handle. Each accepted request/annotation pair feeds the next
	creating the LLM handle. Each accepted request or annotation pair feeds the next

Uh oh!

Conversation

bbednarski9 commented Jul 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Details

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

lvojtku Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

lvojtku Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

lvojtku Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

lvojtku Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

lvojtku Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

lvojtku Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

lvojtku Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bbednarski9 commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading