fix(vllm_model): preserve required_prefix_token_ids by ajaymittur · Pull Request #1545 · NVIDIA-NeMo/Gym

ajaymittur · 2026-06-06T08:47:32Z

Summary

Preserve required_prefix_token_ids when Gym proxies Chat Completions requests to vLLM and when it falls back to /tokenize to recover prompt token IDs.

This field can be passed through OpenAI extra_body by RL agents that need exact multi-turn token continuity. Without declaring the field in Gym's request model, Pydantic drops it during model_dump(exclude_unset=True). Without forwarding it to /tokenize, Gym can still compute unrepaired prompt token IDs even if generation used the repaired prefix.

Why

NeMo-RL checks that generated multi-turn trajectories remain token-contiguous with previous messages. For tool-use trajectories, decoded assistant/tool-call text can retokenize differently from the original generated token IDs.

NeMo-RL's vLLM OpenAI preprocessing hook uses _replace_prefix_tokens to splice the exact prior model-emitted prefix back into the templated prompt. Gym needs to preserve the externally supplied required_prefix_token_ids field so that both generation and tokenize fallback use the same prefix repair path.

Changes

Add required_prefix_token_ids to NeMoGymChatCompletionCreateParamsNonStreaming.
Forward required_prefix_token_ids into the vLLM /tokenize fallback used when top-level prompt_token_ids are absent from the chat completion response.
Add focused tests for schema preservation and tokenize fallback forwarding.

Thanks for taking a look. Yes gym does not send the field in the top level and the API between Nemo Gym and Nemo RL was a little bit unclear since the field is not set as a private or internal variable (like _required_prefix_token_ids) in Nemo RL. Motivation for this MR was sending required_prefix_token_ids from my custom harness in the Gym and still hitting the nemo Rl assertion you linked.

Discussed this with @bxyu-nvidia offline and think it would be better to keep a single source of origin ( prompt_token_ids and generation_token_ids ) for Nemo RL to derive the prefix from. It looks like #1294 attempts to auto derive it primarily for dynamo

Happy to close this if we don't want to allow sending required_prefix_token_ids in the Nemo gym chat completion response to Nemo RL

cmunley1 · 2026-06-09T04:43:22Z

Thanks @ajaymittur

Motivation for this MR was sending required_prefix_token_ids from my custom harness

Are you saying that your harness does not inject the fields prompt_token_ids and generation_token_ids to requests so the assertion is hit?

Here is an example of how we have to similarly handle this for other harness such as Hermes Agent NousResearch/hermes-agent@main...nemo-gym-changes

I do think we need a better solution than these changes on all harnesses (as it is not possible for all, such as claude code), maybe solution is token id caching or auto derive or something

ajaymittur · 2026-06-09T18:02:23Z

Are you saying that your harness does not inject the fields prompt_token_ids and generation_token_ids to requests so the assertion is hit?

My harness was using them to construct required_prefix_token_ids and send that to Nemo RL since the model_post_init hook there made me think prompt_token_ids and generation_token_ids would be used as fallbacks if required_prefix_token_ids was None. No other reason really.

I do like the token caching and auto derive directions but that would probably require a larger design discussion. I think ProRL does something similar with their ManagedSession which makes integrating harnesses easy according to their docs https://github.com/NVIDIA-NeMo/ProRL-Agent-Server/tree/stable/src/polar/agent#the-harness-contract

We can also make the API interface between Nemo RL and Gym clearer in the docs.

ajaymittur · 2026-06-10T17:20:38Z

Closing in favor of #1554

nemo-automation-bot Bot added the community-request Issue reported or requested by someone from the community label Jun 6, 2026

fix(vllm_model): preserve required_prefix_token_ids

46bc5c0

Signed-off-by: Ajay Mittur <amittur@nvidia.com>

ajaymittur force-pushed the fix/required-prefix-token-ids-forwarding branch from 075781e to 46bc5c0 Compare June 6, 2026 08:52

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label Jun 8, 2026

cmunley1 requested a review from bxyu-nvidia June 8, 2026 23:57

svcnvidia-nemo-ci added waiting-on-customer Waiting on the original author to respond and removed waiting-on-maintainers Waiting on maintainers to respond labels Jun 9, 2026

svcnvidia-nemo-ci removed the waiting-on-customer Waiting on the original author to respond label Jun 9, 2026

ananthsub mentioned this pull request Jun 10, 2026

docs: document the Gym to RL framework token-ID data interface #1554

Merged

ajaymittur closed this Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vllm_model): preserve required_prefix_token_ids#1545

fix(vllm_model): preserve required_prefix_token_ids#1545
ajaymittur wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
ajaymittur:fix/required-prefix-token-ids-forwarding

ajaymittur commented Jun 6, 2026

Uh oh!

copy-pr-bot Bot commented Jun 6, 2026

Uh oh!

cmunley1 commented Jun 8, 2026

Uh oh!

ajaymittur commented Jun 9, 2026

Uh oh!

cmunley1 commented Jun 9, 2026

Uh oh!

ajaymittur commented Jun 9, 2026

Uh oh!

ajaymittur commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ajaymittur commented Jun 6, 2026

Summary

Why

Changes

Related

Uh oh!

copy-pr-bot Bot commented Jun 6, 2026

Uh oh!

cmunley1 commented Jun 8, 2026

Uh oh!

ajaymittur commented Jun 9, 2026

Uh oh!

cmunley1 commented Jun 9, 2026

Uh oh!

ajaymittur commented Jun 9, 2026

Uh oh!

ajaymittur commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants