Skip to content

docs: document the Gym to RL framework token-ID data interface#1554

Merged
ananthsub merged 2 commits into
NVIDIA-NeMo:mainfrom
ananthsub:ananthsub/docs-gym-rl-data-contract
Jun 16, 2026
Merged

docs: document the Gym to RL framework token-ID data interface#1554
ananthsub merged 2 commits into
NVIDIA-NeMo:mainfrom
ananthsub:ananthsub/docs-gym-rl-data-contract

Conversation

@ananthsub

Copy link
Copy Markdown
Contributor

Follow up to #1545

Add a Data Interface section to the on-policy corrections page describing what token ID fields the model server returns during training (prompt_token_ids, generation_token_ids, generation_log_probs), where they attach, and the rule that the message-level token IDs are the single source of truth: they are produced once by the model server and propagated turn-to-turn on the messages, and callers must not construct or inject prefix token IDs out of band

@ananthsub ananthsub added documentation Improvements to documentation training Training framework integrations labels Jun 10, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ananthsub ananthsub marked this pull request as ready for review June 10, 2026 16:15
Add a Data Interface section to the on-policy corrections page describing
what token-ID fields the model server returns during training
(prompt_token_ids, generation_token_ids, generation_log_probs), where they
attach, and the rule that the message-level token IDs are the single source
of truth: they are produced once by the model server and propagated
turn-to-turn on the messages, and callers must not construct or inject
prefix token IDs out of band.

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
@ananthsub ananthsub force-pushed the ananthsub/docs-gym-rl-data-contract branch from 28f87d5 to a60d730 Compare June 10, 2026 17:22
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

Copy link
Copy Markdown

@ananthsub ananthsub merged commit 604cc86 into NVIDIA-NeMo:main Jun 16, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements to documentation training Training framework integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants