Skip to content

fix(vllm_model): preserve required_prefix_token_ids#1545

Closed
ajaymittur wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
ajaymittur:fix/required-prefix-token-ids-forwarding
Closed

fix(vllm_model): preserve required_prefix_token_ids#1545
ajaymittur wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
ajaymittur:fix/required-prefix-token-ids-forwarding

Conversation

@ajaymittur

Copy link
Copy Markdown

Summary

Preserve required_prefix_token_ids when Gym proxies Chat Completions requests to vLLM and when it falls back to /tokenize to recover prompt token IDs.

This field can be passed through OpenAI extra_body by RL agents that need exact multi-turn token continuity. Without declaring the field in Gym's request model, Pydantic drops it during model_dump(exclude_unset=True). Without forwarding it to /tokenize, Gym can still compute unrepaired prompt token IDs even if generation used the repaired prefix.

Why

NeMo-RL checks that generated multi-turn trajectories remain token-contiguous with previous messages. For tool-use trajectories, decoded assistant/tool-call text can retokenize differently from the original generated token IDs.

NeMo-RL's vLLM OpenAI preprocessing hook uses _replace_prefix_tokens to splice the exact prior model-emitted prefix back into the templated prompt. Gym needs to preserve the externally supplied required_prefix_token_ids field so that both generation and tokenize fallback use the same prefix repair path.

Changes

  • Add required_prefix_token_ids to NeMoGymChatCompletionCreateParamsNonStreaming.
  • Forward required_prefix_token_ids into the vLLM /tokenize fallback used when top-level prompt_token_ids are absent from the chat completion response.
  • Add focused tests for schema preservation and tokenize fallback forwarding.

Related

Related to the NeMo-RL vLLM worker prefix-token repair path:
https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/generation/vllm/vllm_worker_async.py

@copy-pr-bot

copy-pr-bot Bot commented Jun 6, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@nemo-automation-bot nemo-automation-bot Bot added the community-request Issue reported or requested by someone from the community label Jun 6, 2026
Signed-off-by: Ajay Mittur <amittur@nvidia.com>
@ajaymittur ajaymittur force-pushed the fix/required-prefix-token-ids-forwarding branch from 075781e to 46bc5c0 Compare June 6, 2026 08:52
@svcnvidia-nemo-ci svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label Jun 8, 2026
@cmunley1

cmunley1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Thanks for the PR!

Could you compare to here? #1294

I think that required_prefix_token_ids is reconstructed from prompt_token_ids and generation_token_ids here https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/generation/vllm/vllm_worker_async.py#L391 so there is no mismatch in either chat completions or tokenize call.

I am not sure that this top level field is sent as-is from Gym today, hence 1294 'auto-derives' it.

do you have an example of token id mismatch or motivation for this PR?
e.g. hitting an assertion such as https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/environments/nemo_gym.py#L301

@cmunley1 cmunley1 requested a review from bxyu-nvidia June 8, 2026 23:57
@svcnvidia-nemo-ci svcnvidia-nemo-ci added waiting-on-customer Waiting on the original author to respond and removed waiting-on-maintainers Waiting on maintainers to respond labels Jun 9, 2026
@ajaymittur

Copy link
Copy Markdown
Author

Thanks for taking a look. Yes gym does not send the field in the top level and the API between Nemo Gym and Nemo RL was a little bit unclear since the field is not set as a private or internal variable (like _required_prefix_token_ids) in Nemo RL. Motivation for this MR was sending required_prefix_token_ids from my custom harness in the Gym and still hitting the nemo Rl assertion you linked.

Discussed this with @bxyu-nvidia offline and think it would be better to keep a single source of origin ( prompt_token_ids and generation_token_ids ) for Nemo RL to derive the prefix from. It looks like #1294 attempts to auto derive it primarily for dynamo

Happy to close this if we don't want to allow sending required_prefix_token_ids in the Nemo gym chat completion response to Nemo RL

@cmunley1

cmunley1 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Thanks @ajaymittur

Motivation for this MR was sending required_prefix_token_ids from my custom harness

Are you saying that your harness does not inject the fields prompt_token_ids and generation_token_ids to requests so the assertion is hit?

Here is an example of how we have to similarly handle this for other harness such as Hermes Agent NousResearch/hermes-agent@main...nemo-gym-changes

I do think we need a better solution than these changes on all harnesses (as it is not possible for all, such as claude code), maybe solution is token id caching or auto derive or something

@ajaymittur

Copy link
Copy Markdown
Author

Are you saying that your harness does not inject the fields prompt_token_ids and generation_token_ids to requests so the assertion is hit?

My harness was using them to construct required_prefix_token_ids and send that to Nemo RL since the model_post_init hook there made me think prompt_token_ids and generation_token_ids would be used as fallbacks if required_prefix_token_ids was None. No other reason really.

I do like the token caching and auto derive directions but that would probably require a larger design discussion. I think ProRL does something similar with their ManagedSession which makes integrating harnesses easy according to their docs https://github.com/NVIDIA-NeMo/ProRL-Agent-Server/tree/stable/src/polar/agent#the-harness-contract

We can also make the API interface between Nemo RL and Gym clearer in the docs.

@svcnvidia-nemo-ci svcnvidia-nemo-ci removed the waiting-on-customer Waiting on the original author to respond label Jun 9, 2026
@ajaymittur

Copy link
Copy Markdown
Author

Closing in favor of #1554

@ajaymittur ajaymittur closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request Issue reported or requested by someone from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants