Skip to content

docs: document LLM request and tool execution outcomes#341

Open
bbednarski9 wants to merge 2 commits into
NVIDIA:mainfrom
bbednarski9:docs/llm-intercept-pending-marks
Open

docs: document LLM request and tool execution outcomes#341
bbednarski9 wants to merge 2 commits into
NVIDIA:mainfrom
bbednarski9:docs/llm-intercept-pending-marks

Conversation

@bbednarski9

@bbednarski9 bbednarski9 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Overview

Document the canonical LLM request-intercept and tool-execution intercept outcome contracts.

  • I confirm this contribution is my own work, or I have the right to submit it under this project's license.
  • I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

  • Resolve reviewer feedback in the LLM request-intercept outcome reference with clearer purpose, lifecycle, diagram, and binding-contract explanations.
  • Add a parallel tool-execution outcome reference covering raw next(args) behavior, pending marks, end-before-mark lifecycle ordering, migration, and binding contracts.
  • Update the Python, Node.js, and Rust tool execution middleware examples to return the canonical outcome.
  • Keep this PR documentation-only; it contains no runtime changes.

Where should the reviewer start?

  • docs/reference/llm-request-intercept-outcomes.mdx
  • docs/reference/tool-execution-intercept-outcomes.mdx
  • docs/instrument-applications/advanced-guide.mdx

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Validation

  • Generated Python, Node.js, and Rust API reference pages.
  • Fern check completed with zero errors; redirects validation was skipped because no Fern token is configured.
  • Fern strict broken-link validation passed.
  • git diff --check

Summary by CodeRabbit

  • Documentation
    • Added a new reference page defining the canonical format and lifecycle rules for request-intercept outcomes.
    • Added a new reference page defining the canonical format and lifecycle rules for tool-execution intercept outcomes.
    • Updated request-intercept and middleware/tool-policy examples across Python, Rust, and Node.js to use outcome objects and the revised return shapes.
    • Clarified codec-aware behavior, validation/error handling, and how rewritten requests/annotations must be derived in managed execution.

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

Documentation now standardizes LLM request-intercept outcomes and tool-execution intercept outcomes, updates example code to use outcome objects, and adds new reference pages covering serialization, lifecycle, binding, and migration rules.

Changes

Intercept Outcome Documentation

Layer / File(s) Summary
Canonical LLM request-intercept reference
docs/reference/llm-request-intercept-outcomes.mdx
Defines the canonical outcome fields, request authority rules, intercept resolution flow, language/ABI mappings, lifecycle timing, and migration notes for LLM request intercepts.
Canonical tool-intercept reference
docs/reference/tool-execution-intercept-outcomes.mdx
Defines the canonical tool execution outcome shape, continuation semantics, lifecycle event ordering, binding mappings, and migration guidance.
Build-plugins examples updated to outcome return type
docs/build-plugins/code-examples.mdx, docs/build-plugins/register-behavior.mdx
Python and Rust add_header intercept examples now return LLMRequestInterceptOutcome or LlmRequestInterceptOutcome objects instead of tuples.
Consumer-side outcome usage
docs/instrument-applications/code-examples.mdx, docs/integrate-into-frameworks/code-examples.mdx
Python, Node.js, and Rust examples now capture intercept results as outcome and read outcome.request before conditional execution or downstream use.
Provider-codecs workflow and example updates
docs/integrate-into-frameworks/provider-codecs.mdx
Documents request-codec request authority and rejection behavior, and updates the Python example to return LLMRequestInterceptOutcome objects.

Estimated code review effort: 2 (Simple) | ~12 minutes

Possibly related PRs

  • NVIDIA/NeMo-Relay#327: Covers the same intercept outcome contract shift, but through runtime/FFI and payload-shape changes rather than documentation updates.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title follows Conventional Commits and accurately summarizes the documentation-only changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description matches the required template sections and includes overview, details, reviewer start points, related issues, and checklist items.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

@willkill07 willkill07 added this to the 0.5 milestone Jul 1, 2026
rapids-bot Bot pushed a commit that referenced this pull request Jul 1, 2026
#### Overview

Finalize one canonical LLM request-intercept outcome across the Rust runtime, built-in and adaptive plugins, native ABI v1, `grpc-v1` workers, public C FFI, Go, Python, Node.js, and WebAssembly.

Request intercepts can rewrite the provider request, carry an optional normalized annotation, and schedule ordered marks for the managed LLM lifecycle:

```json
{
  "request": {"headers": {}, "content": {}},
  "annotated_request": null,
  "pending_marks": []
}
```

`request` is required. `annotated_request` defaults to `null`, and `pending_marks` defaults to an empty list. Each pending mark contains only its name, optional category and category profile, data, and metadata; Relay continues to own event UUIDs, parent UUIDs, and timestamps.

The finalized contract also defines one provider-body source of truth. Without a request codec, `outcome.request.content` is authoritative. With a codec, `outcome.annotated_request` is required and authoritative, `outcome.request.content` is read-only context, and `outcome.request.headers` remains writable.

- [x] I confirm this contribution is my own work, or I have the right to submit it under this project's license.
- [x] I searched existing issues and open pull requests, and this does not duplicate existing work.

#### Why

Request intercepts run before Relay creates the managed LLM handle. A mark emitted directly from an intercept therefore cannot reliably attach to that future LLM scope. Returning pending mark specifications lets the lifecycle owner emit them at the correct boundary without leaking control data into provider requests, annotations, codecs, sanitizers, or execution intercepts.

Codec-aware interception also previously allowed two conflicting provider-body representations: an intercept could change both the raw request content and its normalized annotation, while Relay later encoded only the annotation. Making authority explicit prevents raw content edits from being silently discarded.

#### Details

- Make `LlmRequestInterceptOutcome` the only Rust callback result and keep one `register_llm_request_intercept` registration family for global, scope-local, plugin-context, and adaptive paths.
- Propagate each accepted request and annotation to the next intercept while appending pending marks in effective middleware order.
- Without a request codec, use `outcome.request.content` as the provider body.
- With a request codec, require `outcome.annotated_request`, encode the provider body from it, and allow header changes only through `outcome.request.headers`.
- Reject raw `request.content` mutations or missing annotations at the offending codec-path intercept, before later middleware, LLM lifecycle creation, mark emission, or provider invocation.
- Preserve marks from an intercept that breaks the chain; discard all accumulated marks if any intercept fails.
- Return the complete outcome from standalone request-intercept helpers. These helpers expose pending marks but do not emit them because they do not own an LLM lifecycle.
- After successful interception, create the LLM handle and capture one subscriber snapshot before emitting lifecycle events.
- Emit LLM start at `T`, every pending mark at `T + 1µs` in returned order with the LLM UUID as parent, and LLM end at or after `T + 1µs`.
- Apply the same behavior to streaming and non-streaming managed execution, including provider errors and stream finalization.
- Keep pending marks separate from provider-visible requests and annotations.

#### Boundary contracts

- **Native ABI v1:** return one host-owned outcome JSON string. Remove the private annotation-envelope transport and append required outcome-contract version fields to both host and plugin descriptor tables so stale binaries fail before callback invocation.
- **`grpc-v1`:** return one `JsonEnvelope` using schema `nemo.relay.LlmRequestInterceptOutcome@1`.
- **Public C FFI:** return one owned `char **out_outcome_json` and add `nemo_relay_llm_request_intercept_outcome_json_new`.
- **Go:** return `(LLMRequestInterceptOutcome, error)` and expose request, outcome, and pending-mark DTOs.
- **Python:** return `LLMRequestInterceptOutcome` and export `PendingMarkSpec`.
- **Node.js and WebAssembly:** return `{ request, annotated?, pendingMarks? }`. Binding-owned pending-mark DTOs use `categoryProfile`; canonical event and outcome JSON retains `category_profile`.
- **Rust native and worker SDKs:** expose only the canonical callback and registration method.

#### Breaking changes

This intentionally finalizes unpublished contracts in place:

- Rust and Python tuple results are removed.
- C and Go split outputs are removed.
- Mark-specific parallel registration variants are removed.
- The native annotation metadata envelope and fallback parser are removed.
- Native ABI host and plugin tables require the finalized outcome-contract field.
- The `grpc-v1` request-intercept result is replaced by the canonical outcome envelope.
- Codec-path intercepts must return an annotation and may no longer mutate raw `request.content`; malformed outcomes fail before lifecycle creation.
- Node.js and WebAssembly pending-mark objects use `categoryProfile` instead of the Rust/wire name `category_profile`.

All development native plugins and workers must rebuild against this version.

#### Where should the reviewer start?

1. `crates/types/src/api/event.rs` and `crates/types/src/api/llm.rs` for the canonical data contract.
2. `crates/core/src/api/runtime/state.rs`, `crates/core/src/api/shared.rs`, `crates/core/src/api/llm.rs`, and `crates/core/src/stream.rs` for chaining, codec authority, validation, and lifecycle behavior.
3. `crates/plugin/src/lib.rs`, `crates/core/src/plugin/dynamic/native.rs`, and `crates/core/src/plugin/dynamic/worker.rs` for native and worker boundaries.
4. `crates/ffi`, `go/nemo_relay`, `crates/python`, `crates/node`, and `crates/wasm` for binding contracts and DTO conversion.
5. `crates/core/tests/integration/middleware_tests.rs`, `crates/core/tests/integration/pipeline_tests.rs`, `crates/plugin/tests/typed_callbacks.rs`, and the binding tests for lifecycle, codec-authority, and boundary coverage.

The full contract, request-authority diagram, and migration notes are tracked in [companion documentation PR #341](#341), which should merge immediately after this PR.

#### Testing

- `cargo test --workspace --all-targets`
- `cargo clippy --workspace --all-targets -- -D warnings`
- `cargo fmt --all -- --check`
- Python codec and worker SDK coverage passes, including malformed codec-path outcomes and canonical worker envelopes.
- Node.js LLM suite: **38 passed**, including `categoryProfile` input/output conversion and codec-authority rejection.
- Go: all `go/nemo_relay/...` packages passed, including codec-authority coverage; `go vet ./...` passes.
- Native SDK: **52 passed**.
- Worker SDK: **9 passed**; worker protocol tests: **6 passed**.
- C FFI: unit and integration suites passed, including owned outcome allocation and malformed/null input coverage.
- WebAssembly native Rust tests: **13 passed**, including camelCase pending-mark DTO round trips and rejection of the wire-only `category_profile` spelling.
- Repository formatting, strict Clippy, Ruff, Prettier, type, lockfile, FFI-header, and applicable pre-commit checks pass.

`wasm-pack` and the `wasm32-unknown-unknown` Rust target were not available for the package-level Wasm suite. Environment-dependent socket and external-network tests were not used to validate these binding changes.

#### Related Issues

- Relates to #296



## Summary by CodeRabbit

* **New Features**
  * LLM request intercepts can now return a unified outcome that includes the rewritten request, optional annotated request, and pending marks.
  * Pending marks are now emitted alongside LLM lifecycle events and supported across SDKs and plugins.

* **Bug Fixes**
  * Improved consistency of LLM event timing and parent/child relationships.
  * Added stricter validation so intercepts that modify raw request content or omit required annotations are rejected when needed.

Authors:
  - Bryan Bednarski (https://github.com/bbednarski9)

Approvers:
  - Will Killian (https://github.com/willkill07)

URL: #327
@bbednarski9 bbednarski9 marked this pull request as ready for review July 1, 2026 17:59
@bbednarski9 bbednarski9 requested a review from lvojtku as a code owner July 1, 2026 17:59
@coderabbitai coderabbitai Bot added the DO NOT MERGE PR should not be merged; see PR for details label Jul 1, 2026
{/* SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0 */}

Every LLM request intercept returns one canonical outcome:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a description about what this does? Is this the LLM request intercept or the outcome?


`request` is required. `annotated_request` defaults to `null` when omitted on
input, and `pending_marks` defaults to an empty list. Canonical serialization
includes all three fields. A pending mark contains only `name`, optional

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
includes all three fields. A pending mark contains only `name`, optional
includes all three fields. A pending mark only contains `name`, optional


## Request Authority

The provider-body source of truth depends only on whether a request codec is

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The provider-body source of truth depends only on whether a request codec is
The provider-body source of truth only depends on whether a request codec is

annotation, including its flattened `extra` fields for provider-specific data.
Relay rejects a changed raw body or missing annotation at the offending
intercept before invoking later middleware or creating an LLM lifecycle.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the following snippet do?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should begin with "The following example describes/does xyz..."

Comment on lines +60 to +69
Python callbacks return `LLMRequestInterceptOutcome`; Rust callbacks return
`LlmRequestInterceptOutcome`; Go callbacks return
`LLMRequestInterceptOutcome`; and Node.js and WebAssembly callbacks return
`{ request, annotated?, pendingMarks? }`, with `categoryProfile` on each
JavaScript pending-mark DTO. The canonical JSON forms retain `pending_marks`
and `category_profile`. Public C callbacks write one owned canonical outcome
JSON string. Native ABI v1 uses one host-owned outcome JSON string. Rust and
Python `grpc-v1` worker SDKs return their canonical outcome type in a
`JsonEnvelope` whose schema is
`nemo.relay.LlmRequestInterceptOutcome@1`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Python callbacks return `LLMRequestInterceptOutcome`; Rust callbacks return
`LlmRequestInterceptOutcome`; Go callbacks return
`LLMRequestInterceptOutcome`; and Node.js and WebAssembly callbacks return
`{ request, annotated?, pendingMarks? }`, with `categoryProfile` on each
JavaScript pending-mark DTO. The canonical JSON forms retain `pending_marks`
and `category_profile`. Public C callbacks write one owned canonical outcome
JSON string. Native ABI v1 uses one host-owned outcome JSON string. Rust and
Python `grpc-v1` worker SDKs return their canonical outcome type in a
`JsonEnvelope` whose schema is
`nemo.relay.LlmRequestInterceptOutcome@1`.
The following are callbacks and what they return:
- Python callbacks return `LLMRequestInterceptOutcome`
- Rust callbacks return `LlmRequestInterceptOutcome`
- Go callbacks return `LLMRequestInterceptOutcome`
- Node.js and WebAssembly callbacks return`{ request, annotated?, pendingMarks? }`, with `categoryProfile` on each JavaScript pending-mark DTO.
The canonical JSON forms retain `pending_marks` and `category_profile`. Public C callbacks write one owned canonical outcome JSON string. Native ABI v1 uses one host-owned outcome JSON string. Rust and Python `grpc-v1` worker SDKs return their canonical outcome type in a
`JsonEnvelope` whose schema is `nemo.relay.LlmRequestInterceptOutcome@1`.

## Managed Lifecycle

Managed execution runs all effective global and scope-local intercepts before
creating the LLM handle. Each accepted request/annotation pair feeds the next

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
creating the LLM handle. Each accepted request/annotation pair feeds the next
creating the LLM handle. Each accepted request or annotation pair feeds the next

@willkill07 willkill07 removed the DO NOT MERGE PR should not be merged; see PR for details label Jul 1, 2026
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
@bbednarski9 bbednarski9 changed the title docs: document LLM request intercept outcomes docs: document LLM request and tool execution outcomes Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Documentation documentation-related size:M PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants